Skip to content

Conversation

mistydemeo
Copy link
Contributor

@mistydemeo mistydemeo commented Aug 7, 2025

This release contains a bugfix for the MIME type filter and a minor breaking change to the crawl log format.

This release drops support for Python 3.8. The minimum version is now 3.9.

  • The crawl log format previously represented a null content type by writing the single character -, like with other null types. This is inconsistent with Heritrix, which instead writes a null content type as the string unknown. This release has switched over to using unknown like Heritrix. (mime_type filter: fix matching null content-types #237)
  • The MIME type filter had a bug which meant that LIMIT filters wouldn't be applied to recorded URLs with a null content type. This has been fixed, ensuring those records can skip archiving as intended.

@mistydemeo mistydemeo merged commit b53d061 into master Aug 7, 2025
3 checks passed
@mistydemeo mistydemeo deleted the misty/release_2_10_0 branch August 7, 2025 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants