Releases: SojaSurfer/Webscraper
Releases · SojaSurfer/Webscraper
1.1 Removed Interviews
Raw result of Web scraping
Result from scraping presidency.ucsb.edu on 10.11.2024. It contains 1073 speeches.
- The search url was https://www.presidency.ucsb.edu/advanced-search?field-keywords=&field-keywords2=&field-keywords3=&from%5Bdate%5D=01-01-2008&to%5Bdate%5D=11-08-2024&person2=&category2%5B0%5D=63&items_per_page=100&f%5B0%5D=field_docs_attributes%3A205
- The speakers included where ['John McCain', 'Barack Obama', 'Mitt Romney', 'Hillary Clinton', 'Donald J. Trump', 'Joseph R. Biden, Jr.', 'Kamala Harris'].
- It excluded results with 'Press Release' substring within the speech's title.
The zip file includes
- a folder containing all speeches as plain txt files
- a metadata table with one row per speech as csv & excel file
- a txt file with all urls which were scraped
- a quick png visualization of the metadata

