Skip to content

Commit c5b6199

Browse files
committed
update doc
1 parent 67c1dc6 commit c5b6199

File tree

3 files changed

+8
-5
lines changed

3 files changed

+8
-5
lines changed

Readme.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,12 @@
11
# Webscraper
22
A collection of scripts for web scraping content.
33

4-
## 1 Presidency Scraper
4+
## Content
5+
6+
### 1 Presidency Scraper
57
The PresidencyScraper is a script for web scraping from www.presidency.ucsb.edu.
68

7-
[View Presidency Scraper Documentation](presidencyScraper)
9+
[View Presidency Scraper](presidencyScraper)
10+
11+
## License
12+
This project is licensed under the GNU General Public License v3.0. See the [LICENSE](./LICENSE.txt) file for details.

presidencyScraper/Readme.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
The PresidencyScraper is a class for web scraping from www.presidency.ucsb.edu. This class provides methods to scrape speeches, save the content in various formats (JSON, TXT, CSV, Excel), and filter the scraped data based on include and exclude criteria. It also logs the scraping process and handles directory and file management for the scraped data.
33

44
## 1 Getting started
5-
Change the url variable (and timeout, include/exclude) at the bottom and run the script.
5+
Change the url variable (and timeout, include/exclude) at the bottom and run the script. The script is thoroughly tested with Python 3.12.
66

77
```
88
if __name__ = '__main__':
@@ -11,7 +11,6 @@ if __name__ = '__main__':
1111
scraper = PresidencyScraper(url, timeout=1.5)
1212
scraper.scrape(limit=20)
1313
```
14-
The script is thoroughly tested with Python 3.12.
1514

1615
## 2 Documentation
1716

presidencyScraper/presidencyScraper.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,6 @@
1818
# along with PresidencyScraper. If not, see <http://www.gnu.org/licenses/>.
1919

2020

21-
2221
from datetime import datetime
2322
from collections import defaultdict
2423
import json

0 commit comments

Comments
 (0)