feat: add support for _partition metadata column by parthchandra · Pull Request #2668 · apache/iceberg-rust

parthchandra · 2026-06-18T00:34:03Z

Which issue does this PR close?

part of #Support for projecting metadata columns _pos, _spec_id, and _partition in table scan #2607.

What changes are included in this PR?

Implements the _partition metadata column for table scans. This is a struct column whose type is the union of all partition fields across all partition specs (handling partition evolution). Each row gets the
partition values for its data file.

Adds compute_unified_partition_type() to compute the union of partition fields across all specs (equivalent to Java's Partitioning.partitionType())
Adds PartitionColumnConstant and build_partition_column_constant() for pre-computing the struct values per file
Adds ColumnSource::AddStructConstant variant to RecordBatchTransformer for materializing struct columns
Threads the unified partition type through scan planning and populates the constant in FileScanTask
Pipeline detects RESERVED_FIELD_ID_PARTITION in projected fields and injects the struct constant

Are these changes tested?

Because we do not have write support yet, I made the corresponding change to comet and then tested by adding tests in Comet which uses iceberg-java to write files and then iceberg-rust to read them back.
https://github.com/parthchandra/datafusion-comet/blob/iceberg-metadata-columns/spark/src/test/resources/sql-tests/iceberg/metadata_column_partition.sql

parthchandra · 2026-06-18T00:36:27Z

@advancedxy fyi

advancedxy · 2026-06-18T07:39:47Z

+/// # Arguments
+/// * `partition_specs` - Iterator over all partition specs in the table
+/// * `schema` - The current table schema (needed to determine result types of transforms)
+pub fn compute_unified_partition_type<'a>(


nit: it's a bit of odd that this function is added in metadata_column.rs. Do you think it's a good idea to create partitioning.rs and put this function into partitioning.rs?

advancedxy · 2026-06-21T11:56:28Z

+    let mut seen_field_ids = std::collections::HashSet::new();
+    let mut struct_fields: Vec<NestedFieldRef> = Vec::new();
+
+    for spec in partition_specs {


I notice some inconsistent with java's impl:

unknown specs are rejected in java.

specs are sorted by spec id first(in reverse order), which means newer partition spec's field name will be picked fist.

V1 table's void transform field is also handled in java: the partition field that was dropped later.

advancedxy · 2026-06-21T12:02:35Z

+        let partition_column_constant =
+            if let Some(ref unified_partition_type) = self.unified_partition_type {
+                let partition_spec = self
+                    .table_metadata
+                    .partition_spec_by_id(self.partition_spec_id);
+                if let Some(spec) = partition_spec {
+                    let constant = build_partition_column_constant(
+                        unified_partition_type,
+                        spec,
+                        &self.manifest_entry.data_file.partition,
+                    )?;
+                    Some(Arc::new(constant))
+                } else {
+                    None
+                }
+            } else {
+                None
+            };


I'm not sure this is a good idea to calculate the partition column in the plan/scan phase and it adds build_partition_column_constant dep from record_batch_transformer' mod into the scan side.

advancedxy · 2026-06-21T12:12:58Z

+    #[serde(serialize_with = "serialize_not_implemented")]
+    #[serde(deserialize_with = "deserialize_not_implemented")]
+    #[builder(default)]
+    pub partition_column_constant: Option<Arc<PartitionColumnConstant>>,


Like the comment in https://github.com/apache/iceberg-rust/pull/2668/changes#r3448373215, I think it's better to pass the unified partition type here rather than passing the actual unified partition value. It would be easier for comet to pooling the type rather than the actual value.

BTW, this would be unnecessary if we can rebuild/access the table/metadata when reading on the executor side. Java archives this by SerializableTable and with Spark's broadcast. We don't have similar thing on the rust yet.

advancedxy · 2026-06-21T12:14:35Z

+                let mut ids: Vec<i32> = value.identifier_field_ids.into_iter().collect();
+                ids.sort_unstable();
+                Some(ids)
            },


the changes in this file seem unrelated? And I don't think the spec requires sorting identifier field ids.

advancedxy · 2026-06-21T12:16:36Z

+        (DataType::LargeBinary, Some(PrimitiveLiteral::Binary(value))) => {
+            Arc::new(LargeBinaryArray::from_vec(vec![value; num_rows]))
+        }
+        (DataType::LargeBinary, None) => {
+            let vals: Vec<Option<&[u8]>> = vec![None; num_rows];
+            Arc::new(LargeBinaryArray::from_opt_vec(vals))
+        }
+        (DataType::FixedSizeBinary(len), Some(PrimitiveLiteral::Binary(value))) => {
+            let repeated: Vec<&[u8]> = vec![value.as_slice(); num_rows];
+            Arc::new(FixedSizeBinaryArray::try_from_iter(repeated.into_iter()).map_err(|e| {
+                Error::new(
+                    ErrorKind::DataInvalid,
+                    format!("Failed to create FixedSizeBinary({len}) array: {e}"),
+                )
+            })?)
+        }
+        (DataType::FixedSizeBinary(len), None) => {
+            let repeated: Vec<Option<&[u8]>> = vec![None; num_rows];
+            Arc::new(FixedSizeBinaryArray::try_from_sparse_iter_with_size(repeated.into_iter(), *len).map_err(|e| {
+                Error::new(
+                    ErrorKind::DataInvalid,
+                    format!("Failed to create null FixedSizeBinary({len}) array: {e}"),
+                )
+            })?)
+        }


these seems not related?

Our internal integration also shows iceberg mapping Iceberg's binary to Arrow's LargeBinary though, which should also be updated.

advancedxy · 2026-06-21T12:21:11Z

+
+    // A struct column where each child is a constant primitive value.
+    // Used for the _partition metadata column.
+    AddStructConstant {


hmmm. why this is not added to the constant fields? I think java's impl simply add partition's field constant map with a struct projection to mapping the task's partition data into the unified type?

advancedxy · 2026-06-21T12:23:27Z

@parthchandra thanks for pinging me and working on this. I think I'm concerned that the unified partition value is carried in the file scan task, which seems a bit of odd.

feat: add support for _partition metadata column

bd1189e

parthchandra marked this pull request as draft June 18, 2026 00:34

fix ci (public-api, and fmt)

caa7fb0

parthchandra force-pushed the metadata-columns branch from 074f903 to caa7fb0 Compare June 18, 2026 01:19

parthchandra marked this pull request as ready for review June 18, 2026 01:46

advancedxy reviewed Jun 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add support for _partition metadata column#2668

feat: add support for _partition metadata column#2668
parthchandra wants to merge 2 commits into
apache:mainfrom
parthchandra:metadata-columns

parthchandra commented Jun 18, 2026

Uh oh!

parthchandra commented Jun 18, 2026

Uh oh!

advancedxy Jun 18, 2026

Uh oh!

advancedxy Jun 21, 2026

Uh oh!

advancedxy Jun 21, 2026

Uh oh!

advancedxy Jun 21, 2026

Uh oh!

advancedxy Jun 21, 2026

Uh oh!

advancedxy Jun 21, 2026

Uh oh!

advancedxy Jun 21, 2026

Uh oh!

advancedxy commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

parthchandra commented Jun 18, 2026

Which issue does this PR close?

What changes are included in this PR?

Are these changes tested?

Uh oh!

parthchandra commented Jun 18, 2026

Uh oh!

advancedxy Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

advancedxy Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

advancedxy Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

advancedxy Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

advancedxy Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

advancedxy Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

advancedxy Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

advancedxy commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants