doc: add example of RowFilter usage #9115

sonhmai · 2026-01-08T06:54:45Z

Which issue does this PR close?

Closes Document / Add an example of RowFilter usage #9096.

Rationale for this change

The RowFilter API does exist and can evaluate predicates during evaluation, but it has no examples.

What changes are included in this PR?

Added a rustdoc example and blog link to ParquetRecordBatchReaderBuilder::with_row_filter.
Added a running example in parquet/examples/read_with_row_filter.rs

Are these changes tested?

Yes

cargo run -p parquet --example read_with_row_filter
cargo test -p parquet --doc

Are there any user-facing changes?

Yes, doc only. No API changes.

sonhmai · 2026-01-08T07:43:03Z

@alamb would you mind reviewing this? Thanks!

Jefffrey · 2026-01-10T08:30:57Z

parquet/examples/read_with_row_filter.rs

+use parquet::errors::Result;
+use std::fs::File;
+
+// RowFilter / with_row_filter usage. For background and more


Are we better off removing this and keeping only the doctest to reduce duplication?

Yes, I agree -- I think the doc examples are easier to find so I recommend removing this example file

Actually, looking at the existing examples I think many of them are redundant / would be easier to find if we moved them into the documentation:
https://github.com/apache/arrow-rs/tree/main/parquet/examples

Jefffrey · 2026-01-10T08:33:25Z

parquet/src/arrow/arrow_reader/mod.rs

    /// more efficient skipping over data pages. See [`ArrowReaderOptions::with_page_index`].
+    ///
+    /// For a running example see `parquet/examples/read_with_row_filter.rs`.
+    /// See <https://arrow.apache.org/blog/2025/12/11/parquet-late-materialization-deep-dive/>


/// See the [blog post on late materialization] for a more technical explanation. /// /// ... /// /// [blog post on late materialization]: https://arrow.apache.org/blog/2025/12/11/parquet-late-materialization-deep-dive/

Slightly nice formatting this way

alamb

Thank you @sonhmai and @Jefffrey -- this is great work and a nice addition.

I think @Jefffrey and my suggestions would make this PR better, but I also think we could merge it as is and iterate as a follow on too. Just let us know what you would like to do @sonhmai

alamb · 2026-01-11T12:17:00Z

parquet/examples/read_with_row_filter.rs

+use parquet::errors::Result;
+use std::fs::File;
+
+// RowFilter / with_row_filter usage. For background and more


Yes, I agree -- I think the doc examples are easier to find so I recommend removing this example file

Actually, looking at the existing examples I think many of them are redundant / would be easier to find if we moved them into the documentation:
https://github.com/apache/arrow-rs/tree/main/parquet/examples

alamb · 2026-01-11T12:17:12Z

parquet/src/arrow/arrow_reader/mod.rs

    /// more efficient skipping over data pages. See [`ArrowReaderOptions::with_page_index`].
+    ///
+    /// For a running example see `parquet/examples/read_with_row_filter.rs`.
+    /// See <https://arrow.apache.org/blog/2025/12/11/parquet-late-materialization-deep-dive/>


alamb · 2026-01-11T12:18:34Z

parquet/src/arrow/arrow_reader/mod.rs

+    /// let builder = ParquetRecordBatchReaderBuilder::try_new(file)?;
+    /// let schema_desc = builder.metadata().file_metadata().schema_descr_ptr();
+    ///
+    /// // Create predicate: column id > 4. This col has index 0.


Suggested change

/// // Create predicate: column id > 4. This col has index 0.

/// // Create predicate that evaluates `id > 4`. The `id` column has index 0.

alamb · 2026-01-11T12:20:32Z

parquet/src/arrow/arrow_reader/mod.rs

+    /// // Create predicate: column id > 4. This col has index 0.
+    /// let projection = ProjectionMask::leaves(&schema_desc, [0]);
+    /// let predicate = ArrowPredicateFn::new(projection, |batch| {
+    ///     let id_col = batch.column(0);


As a minor suggestion, I think it would make a nicer example if you picked a different column from the file other than 0 so that it is clear the batch passed to the predicate only contains the selected projection column

For example, perhaps you could use the int_col (column index 4)

> select * from './parquet-testing/data/alltypes_plain.parquet'; +----+----------+-------------+--------------+---------+------------+-----------+------------+------------------+------------+---------------------+ | id | bool_col | tinyint_col | smallint_col | int_col | bigint_col | float_col | double_col | date_string_col | string_col | timestamp_col | +----+----------+-------------+--------------+---------+------------+-----------+------------+------------------+------------+---------------------+ | 4 | true | 0 | 0 | 0 | 0 | 0.0 | 0.0 | 30332f30312f3039 | 30 | 2009-03-01T00:00:00 | | 5 | false | 1 | 1 | 1 | 10 | 1.1 | 10.1 | 30332f30312f3039 | 31 | 2009-03-01T00:01:00 | | 6 | true | 0 | 0 | 0 | 0 | 0.0 | 0.0 | 30342f30312f3039 | 30 | 2009-04-01T00:00:00 | | 7 | false | 1 | 1 | 1 | 10 | 1.1 | 10.1 | 30342f30312f3039 | 31 | 2009-04-01T00:01:00 | | 2 | true | 0 | 0 | 0 | 0 | 0.0 | 0.0 | 30322f30312f3039 | 30 | 2009-02-01T00:00:00 | | 3 | false | 1 | 1 | 1 | 10 | 1.1 | 10.1 | 30322f30312f3039 | 31 | 2009-02-01T00:01:00 | | 0 | true | 0 | 0 | 0 | 0 | 0.0 | 0.0 | 30312f30312f3039 | 30 | 2009-01-01T00:00:00 | | 1 | false | 1 | 1 | 1 | 10 | 1.1 | 10.1 | 30312f30312f3039 | 31 | 2009-01-01T00:01:00 | +----+----------+-------------+--------------+---------+------------+-----------+------------+------------------+------------+---------------------+ 8 row(s) fetched. Elapsed 0.039 seconds. > describe './parquet-testing/data/alltypes_plain.parquet'; +-----------------+---------------+-------------+ | column_name | data_type | is_nullable | +-----------------+---------------+-------------+ | id | Int32 | YES | | bool_col | Boolean | YES | | tinyint_col | Int32 | YES | | smallint_col | Int32 | YES | | int_col | Int32 | YES | | bigint_col | Int64 | YES | | float_col | Float32 | YES | | double_col | Float64 | YES | | date_string_col | BinaryView | YES | | string_col | BinaryView | YES | | timestamp_col | Timestamp(ns) | YES | +-----------------+---------------+-------------+ 11 row(s) fetched. Elapsed 0.005 seconds.

github-actions bot added the parquet Changes to the parquet crate label Jan 8, 2026

sonhmai force-pushed the doc/row-filter-usage-9096 branch from 37be4e1 to f286dfd Compare January 8, 2026 06:59

sonhmai changed the title ~~doc: add example of RowFilter usage~~ draft: doc: add example of RowFilter usage Jan 8, 2026

doc: add example of RowFilter usage

bc8e06f

sonhmai force-pushed the doc/row-filter-usage-9096 branch from f286dfd to bc8e06f Compare January 8, 2026 07:32

sonhmai changed the title ~~draft: doc: add example of RowFilter usage~~ doc: add example of RowFilter usage Jan 8, 2026

sonhmai mentioned this pull request Jan 8, 2026

Document / Add an example of RowFilter usage #9096

Open

Jefffrey reviewed Jan 10, 2026

View reviewed changes

alamb approved these changes Jan 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

doc: add example of RowFilter usage #9115

doc: add example of RowFilter usage #9115

sonhmai commented Jan 8, 2026 •

edited

Loading

Uh oh!

sonhmai commented Jan 8, 2026

Uh oh!

Jefffrey Jan 10, 2026

Uh oh!

alamb Jan 11, 2026

Uh oh!

Jefffrey Jan 10, 2026

Uh oh!

alamb Jan 11, 2026

Uh oh!

alamb left a comment

Uh oh!

alamb Jan 11, 2026

Uh oh!

alamb Jan 11, 2026

Uh oh!

alamb Jan 11, 2026

Uh oh!

alamb Jan 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	/// // Create predicate: column id > 4. This col has index 0.
	/// // Create predicate that evaluates `id > 4`. The `id` column has index 0.

doc: add example of RowFilter usage #9115

Are you sure you want to change the base?

doc: add example of RowFilter usage #9115

Conversation

sonhmai commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

sonhmai commented Jan 8, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sonhmai commented Jan 8, 2026 •

edited

Loading