Core: Stream DV Puffin rewrite in RewriteTablePathUtil#16960
Open
JandyTenedora wants to merge 1 commit into
Open
Core: Stream DV Puffin rewrite in RewriteTablePathUtil#16960JandyTenedora wants to merge 1 commit into
JandyTenedora wants to merge 1 commit into
Conversation
rewriteDVFile collected all rewritten blobs into an in-memory list before writing them out, creating unnecessary memory pressure for large DV files. Open the PuffinReader and PuffinWriter together and stream each blob directly to the output as it is read, keeping memory bounded to a single blob instead of the full file contents.
c6eae3c to
ab3fe64
Compare
wombatu-kun
approved these changes
Jun 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
rewriteDVFilecollected all rewritten blobs into an in-memory list before writing them out, creating unnecessary memory pressure for large DV files. This change opens thePuffinReaderandPuffinWritertogether in the same try-with-resources and streams each blob directly to the writer as it is read, keeping memory bounded to a single blob instead of the full file contents.Added a test that creates real serialized deletion vectors, rewrites the Puffin file, and round-trips through
DVUtil.readDV()to verify the output is valid.Closes #15924