Skip to content

Core: Stream DV Puffin rewrite in RewriteTablePathUtil#16960

Open
JandyTenedora wants to merge 1 commit into
apache:mainfrom
JandyTenedora:jandyt/stream-dv-puffin-rewrite
Open

Core: Stream DV Puffin rewrite in RewriteTablePathUtil#16960
JandyTenedora wants to merge 1 commit into
apache:mainfrom
JandyTenedora:jandyt/stream-dv-puffin-rewrite

Conversation

@JandyTenedora

@JandyTenedora JandyTenedora commented Jun 25, 2026

Copy link
Copy Markdown

rewriteDVFile collected all rewritten blobs into an in-memory list before writing them out, creating unnecessary memory pressure for large DV files. This change opens the PuffinReader and PuffinWriter together in the same try-with-resources and streams each blob directly to the writer as it is read, keeping memory bounded to a single blob instead of the full file contents.

Added a test that creates real serialized deletion vectors, rewrites the Puffin file, and round-trips through DVUtil.readDV() to verify the output is valid.

Closes #15924

@github-actions github-actions Bot added the core label Jun 25, 2026
rewriteDVFile collected all rewritten blobs into an in-memory list
before writing them out, creating unnecessary memory pressure for
large DV files. Open the PuffinReader and PuffinWriter together and
stream each blob directly to the output as it is read, keeping memory
bounded to a single blob instead of the full file contents.
@JandyTenedora JandyTenedora force-pushed the jandyt/stream-dv-puffin-rewrite branch from c6eae3c to ab3fe64 Compare June 25, 2026 15:53
@JandyTenedora JandyTenedora marked this pull request as ready for review June 25, 2026 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Core: Stream DV Puffin rewrite in RewriteTablePathUtil to reduce memory pressure

2 participants