feat: add support for _partition metadata column#2668
Conversation
|
@advancedxy fyi |
074f903 to
caa7fb0
Compare
| /// # Arguments | ||
| /// * `partition_specs` - Iterator over all partition specs in the table | ||
| /// * `schema` - The current table schema (needed to determine result types of transforms) | ||
| pub fn compute_unified_partition_type<'a>( |
There was a problem hiding this comment.
nit: it's a bit of odd that this function is added in metadata_column.rs. Do you think it's a good idea to create partitioning.rs and put this function into partitioning.rs?
| let mut seen_field_ids = std::collections::HashSet::new(); | ||
| let mut struct_fields: Vec<NestedFieldRef> = Vec::new(); | ||
|
|
||
| for spec in partition_specs { |
There was a problem hiding this comment.
I notice some inconsistent with java's impl:
- unknown specs are rejected in java.
- specs are sorted by spec id first(in reverse order), which means newer partition spec's field name will be picked fist.
- V1 table's void transform field is also handled in java: the partition field that was dropped later.
| let partition_column_constant = | ||
| if let Some(ref unified_partition_type) = self.unified_partition_type { | ||
| let partition_spec = self | ||
| .table_metadata | ||
| .partition_spec_by_id(self.partition_spec_id); | ||
| if let Some(spec) = partition_spec { | ||
| let constant = build_partition_column_constant( | ||
| unified_partition_type, | ||
| spec, | ||
| &self.manifest_entry.data_file.partition, | ||
| )?; | ||
| Some(Arc::new(constant)) | ||
| } else { | ||
| None | ||
| } | ||
| } else { | ||
| None | ||
| }; |
There was a problem hiding this comment.
I'm not sure this is a good idea to calculate the partition column in the plan/scan phase and it adds build_partition_column_constant dep from record_batch_transformer' mod into the scan side.
| #[serde(serialize_with = "serialize_not_implemented")] | ||
| #[serde(deserialize_with = "deserialize_not_implemented")] | ||
| #[builder(default)] | ||
| pub partition_column_constant: Option<Arc<PartitionColumnConstant>>, |
There was a problem hiding this comment.
Like the comment in https://github.com/apache/iceberg-rust/pull/2668/changes#r3448373215, I think it's better to pass the unified partition type here rather than passing the actual unified partition value. It would be easier for comet to pooling the type rather than the actual value.
BTW, this would be unnecessary if we can rebuild/access the table/metadata when reading on the executor side. Java archives this by SerializableTable and with Spark's broadcast. We don't have similar thing on the rust yet.
| let mut ids: Vec<i32> = value.identifier_field_ids.into_iter().collect(); | ||
| ids.sort_unstable(); | ||
| Some(ids) | ||
| }, |
There was a problem hiding this comment.
the changes in this file seem unrelated? And I don't think the spec requires sorting identifier field ids.
| (DataType::LargeBinary, Some(PrimitiveLiteral::Binary(value))) => { | ||
| Arc::new(LargeBinaryArray::from_vec(vec![value; num_rows])) | ||
| } | ||
| (DataType::LargeBinary, None) => { | ||
| let vals: Vec<Option<&[u8]>> = vec![None; num_rows]; | ||
| Arc::new(LargeBinaryArray::from_opt_vec(vals)) | ||
| } | ||
| (DataType::FixedSizeBinary(len), Some(PrimitiveLiteral::Binary(value))) => { | ||
| let repeated: Vec<&[u8]> = vec![value.as_slice(); num_rows]; | ||
| Arc::new(FixedSizeBinaryArray::try_from_iter(repeated.into_iter()).map_err(|e| { | ||
| Error::new( | ||
| ErrorKind::DataInvalid, | ||
| format!("Failed to create FixedSizeBinary({len}) array: {e}"), | ||
| ) | ||
| })?) | ||
| } | ||
| (DataType::FixedSizeBinary(len), None) => { | ||
| let repeated: Vec<Option<&[u8]>> = vec![None; num_rows]; | ||
| Arc::new(FixedSizeBinaryArray::try_from_sparse_iter_with_size(repeated.into_iter(), *len).map_err(|e| { | ||
| Error::new( | ||
| ErrorKind::DataInvalid, | ||
| format!("Failed to create null FixedSizeBinary({len}) array: {e}"), | ||
| ) | ||
| })?) | ||
| } |
There was a problem hiding this comment.
these seems not related?
Our internal integration also shows iceberg mapping Iceberg's binary to Arrow's LargeBinary though, which should also be updated.
|
|
||
| // A struct column where each child is a constant primitive value. | ||
| // Used for the _partition metadata column. | ||
| AddStructConstant { |
There was a problem hiding this comment.
hmmm. why this is not added to the constant fields? I think java's impl simply add partition's field constant map with a struct projection to mapping the task's partition data into the unified type?
|
@parthchandra thanks for pinging me and working on this. I think I'm concerned that the unified partition value is carried in the file scan task, which seems a bit of odd. |
Which issue does this PR close?
What changes are included in this PR?
Implements the _partition metadata column for table scans. This is a struct column whose type is the union of all partition fields across all partition specs (handling partition evolution). Each row gets the
partition values for its data file.
Are these changes tested?
Because we do not have write support yet, I made the corresponding change to comet and then tested by adding tests in Comet which uses iceberg-java to write files and then iceberg-rust to read them back.
https://github.com/parthchandra/datafusion-comet/blob/iceberg-metadata-columns/spark/src/test/resources/sql-tests/iceberg/metadata_column_partition.sql