-
Notifications
You must be signed in to change notification settings - Fork 990
create PageIndexPolicy to allow optional indexes #8071
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
create PageIndexPolicy to allow optional indexes #8071
Conversation
- Rename PageIndexPolicy::Off to PageIndexPolicy::Skip - impl From<bool> for PageIndexPolicy for DRY - Expose PageIndexPolicy to Arrow
I think this is a good idea, FWIW and a nice change. Is this PR ready for review @kczimm (it is currently marked as a draft)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see the desire for this, but I think some discussion is warranted to suss out what the desired behavior is for the Optional
case.
Thanks for raising the issue @kczimm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kczimm !
I think this looks good to me -- though I think it would be good if @etseidl also had a look before we merged this. I had a few suggestions but nothing that I think is required before merging
The CI isn't passing -- I think if you merge up (or rebase) from main it should be clean
Thanks again for your patience
Sorry, I've been somewhat taken over by thrift and life 😬. I'll try to take another look today, but don't hold this up for me either. My main concern was addressed. |
That is what we like to hear ! I love fostering the ability to obsess over some low level technical thing to make it really cool! |
working on the CI issues... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks! Just a few minor nits left.
@alamb any last words before merging this? |
Nope. DO IT! |
Which issue does this PR close?
Rationale for this change
This change introduces a more flexible way to handle page indexes (column and offset indexes) in Parquet files. Previously, the reading of these indexes was controlled by boolean flags, which indicated read required or do not read. The new
PageIndexPolicy
enum (Off
,Optional
,Required
) provides finer control, allowing users to specify whether an index is not read, read if present (without error if missing), or strictly required (error if missing).What changes are included in this PR?
PageIndexPolicy
enum withOff
,Optional
, andRequired
variants.column_index
andoffset_index
fields inParquetMetaDataReader
with the newPageIndexPolicy
enum.ParquetMetaDataReader::new()
function to initialize page index policies toOff
, preserving previous defaults.with_page_indexes
,with_column_indexes
, andwith_offset_indexes
methods to utilize the newPageIndexPolicy
, defaulting toRequired
when enabling indexes.with_page_index_policy
,with_column_index_policy
, andwith_offset_index_policy
to allow direct setting of the page index policy.PageIndexPolicy
, including returning an error if aRequired
index is not found.Are these changes tested?
Yes, a new test file
parquet/tests/page_index.rs
has been added to cover the functionality of the newPageIndexPolicy
and its integration withParquetMetaDataReader
.Are there any user-facing changes?
Yes, there are user-facing changes to the
ParquetMetaDataReader
API. Thewith_column_indexes
andwith_offset_indexes
methods now implicitly usePageIndexPolicy::Required
when enabling page indexes. New methodswith_page_index_policy
,with_column_index_policy
, andwith_offset_index_policy
have been added.