-
Notifications
You must be signed in to change notification settings - Fork 64
Open
Description
#first run
1.7M warc_cache/warcs/book.pythontips.com.warc.gz
#Second run, exact same code
516K warc_cache/warcs/book.pythontips.com.warc.gz
#Deleted dedupe but not warc file
1.7M warc_cache/warcs/book.pythontips.com.warc.gz
It looks like the dedupe file is used again, but the warc file is being created from scratch. That's definitely not was I would expect, is that how it's supposed to work? If you're recreating the warc file, shouldn't you be recreating the DB as well?
Metadata
Metadata
Assignees
Labels
No labels