Skip to content

cmbr/feed-scraping

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 

Repository files navigation

gocomics-scrape.py

GoComics.com (which recently merged with Comics.com) has stopped linking to or updating their RSS feeds.

This script fetches a GoComics.com strip homepage, generates strip URLs and then for each one looks up the the actual comic image and outputs a minimal Atom feed with the image. Sample usage:

python gocomics-scrape.py frazz > ~/www/scraped/frazz.xml

I've put something like that in a cron job that runs once an hour.

Incidentally, Frazz and Calvin and Hobbes are the comics that I wanted this for, so if you're looking for a full content feeds for them they can be found at http://persistent.info/scraped/frazz.xml and http://persistent.info/scraped/calvinandhobbes.xml.

daily-puppy-scrape.py

The Daily Puppy ostensibly has an RSS feed. However, it has not worked since early January 2014 (the contents are empty). Given that the site also has references to iGoogle (shut down on November 1, 2013), it doesn't seem like it's being maintained from a technical perspective. This script scrapes the most recent 10 puppies and generates a (full-content) feed for them (it uses the same XML endpoints as the the iOS app).

The result is placed at http://persistent.info/scraped/daily-puppy.xml. If the site fixes its offical feed, I will redirect this URL back to the official feed.

About

Collection of feed scraping scripts

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%