Skip to content

Do not evaluate predicates if they can be proven to be false #19028

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

in the context of #18868, @xudong963 is adding the ability to tell when a predicate is always true for a particular Parquet row group (aka it filters no rows)

@crepererum noted there is another interesting potential optimization when we know the predicate is true for the entire row group we can skip evaluating the predicate for the row group entirely.

This can improve performance as the filter can often be quite expensive itself, and it’s a no-trade off optimization, if it can be applied it’s always a win.

@adriangb reported they have implemented this optimization at his company and seen substantial improvements

Describe the solution you'd like

If a predicate is determined to not filter any rows for a particular row group, don't apply it

Describe alternatives you've considered

This likely is only relevant when parquet pushdown is enabled.

I think the api to evaluate parquet predicates would need some updating as well as there is no way now to evaluate predicates on only some row groups, but not others

Additional context

This came up in the context of a discussion with @xudong963 @adriangb here

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions