Skip to content

farchy/rss-article-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rss-article-crawler

RSS driven article crawler and scraper.
To start using first install the requirements:

pip install -r base_requirements.txt

If for some reason JPype does not install try:

sudo apt-get install python-jpype

And then install the rest of the requirements:

pip install -r requirements.txt

Now all you have to do is insert a seed of RSS feeds into resources/rss.txt,
Enter the src folder and then simply run:

python webCrawler.py

Dependencies:

Based on Boilerpipe's HTML ArticleExtractor (scraper).

About

RSS driven article crawler and scraper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages