Skip to content
applications4android edited this page Dec 25, 2012 · 1 revision

Q0. Why not use RSS feeds instead of html-screen-scraping?

RSS feeds provide a robust way to determine any changes in a given website. But unfortunately, they are an optional feature. All website owners may or may not use this feature on their web pages. But however, all comic web pages do have certain amount of patterns in them. They all have a determined way of going to the next/previous/random comic. They all have a first comic! (ofcourse). So, it is far better (and easier to code) to parse the html pages directly.

Q1. I see that you use regex to parse the html pages. Why not use a HTML Parser like HTML Cleaner?

We started using Html parser libraries like HTML Cleaner. But it turns out that the parsing took a long time to finish. That way, parsing was the bottleneck, rather than the comic webpage download time. Hence, we had to sacrifice upon robustness to achieve the desired speed. After all, it doesn't make sense to wait for about a minute to read your favorite strip, right!?

That said, in case you have better ideas to maintain the speed, at the same time provide robustness for parsing, please do Talk To Us.

Q2. Why not setup a server to store the comics, instead of html-parsing?

The moment we start storing comics, many of the content-owners (read comic authors) will not be happy. I'm sure there are ways to work-around, but currently we neither have any man-power to invest on this effort nor to fight such issues.

Q3. How to add a new comic?

Please refer to our page on adding-new-comic.

Clone this wiki locally