Skip to content

[Variant] Fix broken metadata builder rollback #8135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

scovich
Copy link
Contributor

@scovich scovich commented Aug 13, 2025

Which issue does this PR close?

Rationale for this change

New unit tests demonstrate that variant builder rollback was broken, producing various validation failures. The problem was subtle -- using buffer length instead of field count when rolling back metadata builder state.

What changes are included in this PR?

Fix the bug, and fix two existing unit tests that expected wrong behavior.

While we're at it, add a human-readable impl Debug for Variant, which gives a convenient way of comparing two variant values. Also add the missing VariantBuilder::[try_]with_value methods that the other two builders already had.

Are these changes tested?

New and existing unit tests cover the changes.

Are there any user-facing changes?

Output of impl Debug for Variant changed.

Two new VariantBuilder methods.

@github-actions github-actions bot added the parquet-variant parquet-variant* crates label Aug 13, 2025
@scovich scovich force-pushed the variant-nested-rollback branch from 59f9e60 to 27b3cdb Compare August 14, 2025 03:04
@scovich scovich changed the title [Variant] Nested builder rollback is broken [Variant] Fix broken metadata builder rollback Aug 14, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @scovich -- this is a very nice find I had some suggestions, the most important of which I think are the naming of metadata_current_offset and the Debug display

///
/// This method will panic if the variant contains duplicate field names in objects
/// when validation is enabled. For a fallible version, use [`ListBuilder::try_with_value`].
pub fn with_value<'m, 'd, T: Into<Variant<'m, 'd>>>(mut self, value: T) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

impl std::fmt::Debug for Variant<'_, '_> {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
Variant::Null => write!(f, "null"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for the debug printing we should also include the variant enum, otherwise one wouldn't be able to tell the difference between Variant::Int8(42) and Variant::Int16(42) from the debug output, as they would both be rendered as 42

So perhaps we can keep

Suggested change
Variant::Null => write!(f, "null"),
Variant::Null => write!(f, "Variant::Null"),

And then apply some nicer formatting to the binary / object ones

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pulled out the impl Debug as its own PR that should merge first:

Let's pick up the conversation there as needed, this PR will shrink a lot after rebasing on top of it.

@scovich scovich requested a review from alamb August 14, 2025 15:29
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you -- I think this looks great @scovich

@alamb
Copy link
Contributor

alamb commented Aug 14, 2025

FYI @klion26 @abacef and @friendlymatthew as you may be interested in this PR

Copy link
Member

@klion26 klion26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix, LGTM

list.finish();
}
}
if i % skip != 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test!

Do we need to make the finish conditions for the inner and outer objects different? so that the field names added by the inner and outer objects are different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet-variant parquet-variant* crates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Variant] Nested builder rollback is broken
3 participants