Skip to content

Multiple WARCs#84

Merged
extua merged 4 commits intobodleian:mainfrom
jamesdbaker:multiple_warcs
Apr 28, 2026
Merged

Multiple WARCs#84
extua merged 4 commits intobodleian:mainfrom
jamesdbaker:multiple_warcs

Conversation

@jamesdbaker
Copy link
Copy Markdown
Contributor

This PR adds support for adding multiple WARC files to a single WACZ, by adding a from_files function to the WACZ interface. The original from_file function is kept for backwards compatability, but now hands off to the from_files function. This involved changes to the datapackage module, to support multiple files.

Also changed the CLI such that you can specify multiple WARCs, and optionally an output path.

Copy link
Copy Markdown
Collaborator

@extua extua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 1feb7b7 is already included in PR #83, could you remove it from this one?

@jamesdbaker
Copy link
Copy Markdown
Contributor Author

Rebased on PR #83 with the conditional compilation. If I don't include those changes, then not all the tests pass and I'd be submitting a PR with failing tests.

@jamesdbaker jamesdbaker requested a review from extua April 24, 2026 15:25
Copy link
Copy Markdown
Collaborator

@extua extua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested this now on my Debian laptop and it works, thanks for the PR!

However, I'm not able to merge this at the moment. To explain: my employment at the Bodleian Libraries ended in December 2025 and when I lost my Oxford University email address I lost my permission to push code to this repo. I've emailed my old line manager to ask if I could have permissions restored. Until then, I guess this software is officially unmaintained.

In the meantime, could you confirm that this code was produced by yourself, and not generative AI? See #81

@extua
Copy link
Copy Markdown
Collaborator

extua commented Apr 27, 2026

Short update: I have admin rights on the repo again! Don't worry about the failing tests, they were failing before your changes and I should fix them separately. I've run the build from your branch on the two example warc.gz files and it produced an output.wacz archive which replayed fine in replayweb.page, so I'm confident it works.

@jamesdbaker
Copy link
Copy Markdown
Contributor Author

Thanks @extua - I can confirm I wrote the code.

@extua extua merged commit 9bc5f78 into bodleian:main Apr 28, 2026
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants