Skip to content

feat: add support for _partition metadata column#2668

Open
parthchandra wants to merge 2 commits into
apache:mainfrom
parthchandra:metadata-columns
Open

feat: add support for _partition metadata column#2668
parthchandra wants to merge 2 commits into
apache:mainfrom
parthchandra:metadata-columns

Conversation

@parthchandra

Copy link
Copy Markdown

Which issue does this PR close?

What changes are included in this PR?

Implements the _partition metadata column for table scans. This is a struct column whose type is the union of all partition fields across all partition specs (handling partition evolution). Each row gets the
partition values for its data file.

  • Adds compute_unified_partition_type() to compute the union of partition fields across all specs (equivalent to Java's Partitioning.partitionType())
  • Adds PartitionColumnConstant and build_partition_column_constant() for pre-computing the struct values per file
  • Adds ColumnSource::AddStructConstant variant to RecordBatchTransformer for materializing struct columns
  • Threads the unified partition type through scan planning and populates the constant in FileScanTask
  • Pipeline detects RESERVED_FIELD_ID_PARTITION in projected fields and injects the struct constant

Are these changes tested?

Because we do not have write support yet, I made the corresponding change to comet and then tested by adding tests in Comet which uses iceberg-java to write files and then iceberg-rust to read them back.
https://github.com/parthchandra/datafusion-comet/blob/iceberg-metadata-columns/spark/src/test/resources/sql-tests/iceberg/metadata_column_partition.sql

@parthchandra parthchandra marked this pull request as draft June 18, 2026 00:34
@parthchandra

Copy link
Copy Markdown
Author

@advancedxy fyi

@parthchandra parthchandra marked this pull request as ready for review June 18, 2026 01:46
/// # Arguments
/// * `partition_specs` - Iterator over all partition specs in the table
/// * `schema` - The current table schema (needed to determine result types of transforms)
pub fn compute_unified_partition_type<'a>(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's a bit of odd that this function is added in metadata_column.rs. Do you think it's a good idea to create partitioning.rs and put this function into partitioning.rs?

let mut seen_field_ids = std::collections::HashSet::new();
let mut struct_fields: Vec<NestedFieldRef> = Vec::new();

for spec in partition_specs {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice some inconsistent with java's impl:

  1. unknown specs are rejected in java.
  2. specs are sorted by spec id first(in reverse order), which means newer partition spec's field name will be picked fist.
  3. V1 table's void transform field is also handled in java: the partition field that was dropped later.

Comment on lines +129 to +146
let partition_column_constant =
if let Some(ref unified_partition_type) = self.unified_partition_type {
let partition_spec = self
.table_metadata
.partition_spec_by_id(self.partition_spec_id);
if let Some(spec) = partition_spec {
let constant = build_partition_column_constant(
unified_partition_type,
spec,
&self.manifest_entry.data_file.partition,
)?;
Some(Arc::new(constant))
} else {
None
}
} else {
None
};

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is a good idea to calculate the partition column in the plan/scan phase and it adds build_partition_column_constant dep from record_batch_transformer' mod into the scan side.

#[serde(serialize_with = "serialize_not_implemented")]
#[serde(deserialize_with = "deserialize_not_implemented")]
#[builder(default)]
pub partition_column_constant: Option<Arc<PartitionColumnConstant>>,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the comment in https://github.com/apache/iceberg-rust/pull/2668/changes#r3448373215, I think it's better to pass the unified partition type here rather than passing the actual unified partition value. It would be easier for comet to pooling the type rather than the actual value.

BTW, this would be unnecessary if we can rebuild/access the table/metadata when reading on the executor side. Java archives this by SerializableTable and with Spark's broadcast. We don't have similar thing on the rust yet.

Comment on lines +118 to 121
let mut ids: Vec<i32> = value.identifier_field_ids.into_iter().collect();
ids.sort_unstable();
Some(ids)
},

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the changes in this file seem unrelated? And I don't think the spec requires sorting identifier field ids.

Comment on lines +912 to +936
(DataType::LargeBinary, Some(PrimitiveLiteral::Binary(value))) => {
Arc::new(LargeBinaryArray::from_vec(vec![value; num_rows]))
}
(DataType::LargeBinary, None) => {
let vals: Vec<Option<&[u8]>> = vec![None; num_rows];
Arc::new(LargeBinaryArray::from_opt_vec(vals))
}
(DataType::FixedSizeBinary(len), Some(PrimitiveLiteral::Binary(value))) => {
let repeated: Vec<&[u8]> = vec![value.as_slice(); num_rows];
Arc::new(FixedSizeBinaryArray::try_from_iter(repeated.into_iter()).map_err(|e| {
Error::new(
ErrorKind::DataInvalid,
format!("Failed to create FixedSizeBinary({len}) array: {e}"),
)
})?)
}
(DataType::FixedSizeBinary(len), None) => {
let repeated: Vec<Option<&[u8]>> = vec![None; num_rows];
Arc::new(FixedSizeBinaryArray::try_from_sparse_iter_with_size(repeated.into_iter(), *len).map_err(|e| {
Error::new(
ErrorKind::DataInvalid,
format!("Failed to create null FixedSizeBinary({len}) array: {e}"),
)
})?)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these seems not related?

Our internal integration also shows iceberg mapping Iceberg's binary to Arrow's LargeBinary though, which should also be updated.


// A struct column where each child is a constant primitive value.
// Used for the _partition metadata column.
AddStructConstant {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm. why this is not added to the constant fields? I think java's impl simply add partition's field constant map with a struct projection to mapping the task's partition data into the unified type?

@advancedxy

Copy link
Copy Markdown

@parthchandra thanks for pinging me and working on this. I think I'm concerned that the unified partition value is carried in the file scan task, which seems a bit of odd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants