GitHub - cmbr/feed-scraping: Collection of feed scraping scripts

gocomics-scrape.py

GoComics.com (which recently merged with Comics.com) has stopped linking to or updating their RSS feeds.

This script fetches a GoComics.com strip homepage, generates strip URLs and then for each one looks up the the actual comic image and outputs a minimal Atom feed with the image. Sample usage:

python gocomics-scrape.py frazz > ~/www/scraped/frazz.xml

I've put something like that in a cron job that runs once an hour.

Incidentally, Frazz and Calvin and Hobbes are the comics that I wanted this for, so if you're looking for a full content feeds for them they can be found at http://persistent.info/scraped/frazz.xml and http://persistent.info/scraped/calvinandhobbes.xml.

daily-puppy-scrape.py

The Daily Puppy ostensibly has an RSS feed. However, it has not worked since early January 2014 (the contents are empty). Given that the site also has references to iGoogle (shut down on November 1, 2013), it doesn't seem like it's being maintained from a technical perspective. This script scrapes the most recent 10 puppies and generates a (full-content) feed for them (it uses the same XML endpoints as the the iOS app).

The result is placed at http://persistent.info/scraped/daily-puppy.xml. If the site fixes its offical feed, I will redirect this URL back to the official feed.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
LICENSE.txt		LICENSE.txt
README.md		README.md
daily-puppy-scrape.py		daily-puppy-scrape.py
gocomics-scrape.py		gocomics-scrape.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

gocomics-scrape.py

daily-puppy-scrape.py

About

Uh oh!

Releases

Packages

Languages

License

cmbr/feed-scraping

Folders and files

Latest commit

History

Repository files navigation

gocomics-scrape.py

daily-puppy-scrape.py

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages