Skip to content

EPA Publication List: NSCEP #359

@hkuchampudi

Description

@hkuchampudi

EPA Publication List

  • Agency: Environmental Protection Agency
  • Agency Division: National Service Center for Environmental Publications (NSCEP)
  • Data Type: Various
  • Data Format: PDF

I have mined the metadata for the EPA Publication List and have hosted the direct download links to the PDFs in my repository. I need help mining the documents themselves as I do not have the space to download them.

Downloading the Documents

You can execute the following command (after downloading the directLinks.txt file) replacing the placeholders with the appropriate values to download files in bulk:

awk 'FNR>=[Starting_Line_Number] && FNR<=[Ending_Line_Number]' [Links_Location] | while read -r link; do wget -t 10 -T 10 -U "Mozilla" $(echo $link | tr -d '\r'); done
  • [Starting_Line_Number] with the line number of the first link to download
  • [Ending_Line_Number] with the line number of the last link to download
  • [Links_Location] with the path to the downloaded directLinks.txt file from the repository

Download Information

Property Value
Number links/documents 75973
Estimated total filesize 16.386551273 GB

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions