-
Notifications
You must be signed in to change notification settings - Fork 15
Open
Description
EPA Publication List
- Agency: Environmental Protection Agency
- Agency Division: National Service Center for Environmental Publications (NSCEP)
- Data Type: Various
- Data Format: PDF
I have mined the metadata for the EPA Publication List and have hosted the direct download links to the PDFs in my repository. I need help mining the documents themselves as I do not have the space to download them.
Downloading the Documents
You can execute the following command (after downloading the directLinks.txt file) replacing the placeholders with the appropriate values to download files in bulk:
awk 'FNR>=[Starting_Line_Number] && FNR<=[Ending_Line_Number]' [Links_Location] | while read -r link; do wget -t 10 -T 10 -U "Mozilla" $(echo $link | tr -d '\r'); done
- [Starting_Line_Number] with the line number of the first link to download
- [Ending_Line_Number] with the line number of the last link to download
- [Links_Location] with the path to the downloaded directLinks.txt file from the repository
Download Information
| Property | Value |
|---|---|
| Number links/documents | 75973 |
| Estimated total filesize | 16.386551273 GB |
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels