Is your feature request related to a problem or challenge?
Feature request.
Background
compute_identity_cols in crates/integrations/datafusion/src/table/bucketing.rs (added in #2298) returns None which forces the eager scan to declare UnknownPartitioning whenever a table has more than one historical partition spec.
This is safe but stricter than iceberg-java, which intersects the identity fields present across all specs (Partitioning.groupingKeyType / commonActiveFieldIds) and still reports a grouping key on the columns that are identity-partitioned in every spec.
Why it's conservative today
The eager bucketing path hashes each task on the partition-tuple slot that matches the table's default spec. Under spec evolution, older files carry a partition tuple whose slot order does not necessarily align with the default spec, and FileScanTask does not currently carry its own spec id to disambiguate. A per-column intersection was attempted in e0d6add and reverted in f25c911 as out of scope for #2298.
Describe the solution you'd like
Match iceberg-java: compute the intersection of identity-source fields common to every spec and declare Partitioning::Hash on those columns, resolving each task's partition slot via its own spec id rather than assuming the default spec's slot order.
Follow-up to #2298.
Willingness to contribute
I would be willing to contribute to this feature with guidance from the Iceberg Rust community
Is your feature request related to a problem or challenge?
Feature request.
Background
compute_identity_colsincrates/integrations/datafusion/src/table/bucketing.rs(added in #2298) returnsNonewhich forces the eager scan to declareUnknownPartitioningwhenever a table has more than one historical partition spec.This is safe but stricter than iceberg-java, which intersects the identity fields present across all specs (
Partitioning.groupingKeyType/commonActiveFieldIds) and still reports a grouping key on the columns that are identity-partitioned in every spec.Why it's conservative today
The eager bucketing path hashes each task on the partition-tuple slot that matches the table's default spec. Under spec evolution, older files carry a partition tuple whose slot order does not necessarily align with the default spec, and
FileScanTaskdoes not currently carry its own spec id to disambiguate. A per-column intersection was attempted in e0d6add and reverted in f25c911 as out of scope for #2298.Describe the solution you'd like
Match iceberg-java: compute the intersection of identity-source fields common to every spec and declare
Partitioning::Hashon those columns, resolving each task's partition slot via its own spec id rather than assuming the default spec's slot order.Follow-up to #2298.
Willingness to contribute
I would be willing to contribute to this feature with guidance from the Iceberg Rust community