Skip to content

Support for projecting metadata columns _pos, _spec_id, and _partition in table scan #2607

@parthchandra

Description

@parthchandra

Is your feature request related to a problem or challenge?

These columns are required for row-level operations (DELETE, UPDATE, MERGE) in both copy-on-write and merge-on-read strategies:

Column CoW MoR Role
_file required row identity Identifies the source data file
_pos required row identity Identifies the row within the file
_spec_id -- required Partition spec ID for delta writes
_partition -- required Partition values for delta writes

Without these, query engines cannot implement row-level mutations against Iceberg tables via iceberg-rust.

In iceberg-java, Spark's row-level operation API requires these columns:

  • Copy-on-Write (SparkCopyOnWriteOperation.requiredMetadataAttributes()): requests _file + _pos to identify which rows to rewrite during DELETE/UPDATE.
  • Merge-on-Read (SparkPositionDeltaOperation): uses _file + _pos as rowId() to uniquely identify rows, and requests _spec_id + _partition via requiredMetadataAttributes() so the delta writer knows which partition to write position delete files into.

This pattern is consistent across iceberg-java v3.4 through v4.1. Any query engine (DataFusion Comet, etc.) building row-level operations on top of iceberg-rust will need these columns.

Describe the solution you'd like

1. _spec_id metadata column

Description: Constant per FileScanTask -- same pattern as _file.

Changes:

  • Add spec_id: i32 field to FileScanTask in scan/task.rs, populated from the manifest entry's partition_spec_id during scan planning
  • In pipeline.rs, inject as constant when projected:
    if task.project_field_ids().contains(&RESERVED_FIELD_ID_SPEC_ID) {
        let spec_id_datum = Datum::int(task.spec_id);
        builder = builder.with_constant(RESERVED_FIELD_ID_SPEC_ID, spec_id_datum);
    }
  • Add tests following the existing _file column test patterns

2. _pos metadata column

Description: The ordinal row position (0-based) within the source data file. Unlike _file and _spec_id, this is NOT a constant -- it increases monotonically across batches within a file.

Changes:

  • New ColumnSource variant in RecordBatchTransformer (e.g., RowPosition) that generates sequential Int64Array values
  • Mutable state in the transformer tracking the row offset across batches within a file. After each batch of N rows, start_offset += N.
  • Handle split reads: if FileScanTask reads a portion of a file, the initial position offset must account for rows before the split (from Parquet row group's row index offset)
  • In pipeline.rs, detect RESERVED_FIELD_ID_POS in projected fields and configure the transformer accordingly
  • Must use the same 0-based numbering semantics as positional delete files

Design considerations:

  • iceberg-rust currently handles positional deletes via a separate DeleteVector/RowSelection pre-filtering mechanism. The _pos column is architecturally independent but must agree on numbering.
  • In Java, PositionVectorReader gets rowStart from PageReadStore.getRowIndexOffset() per row group, then fills [rowStart, rowStart+1, ..., rowStart+N-1] per batch.

3. _partition metadata column

Description: A struct column whose type is the union of all partition fields across all partition specs in the table (to handle partition evolution). Each row gets the partition values for its data file, with nulls for fields from other specs.

Changes:

  1. Compute the table-level partition type as a union struct of all partition fields across all specs. Equivalent to Java's Partitioning.partitionType(table). The function partition_field() in metadata_columns.rs already constructs a struct from partition fields -- may need a helper that collects fields across all specs.

  2. Propagate the unified partition type to FileScanTask so each task knows the full struct schema and which fields to null-fill for its specific spec.

  3. Materialize as a constant StructArray per batch:

    • Fields present in this file's partition spec get their values from partition data
    • Fields from other specs (partition evolution) get null
    • The struct is repeated (constant) for all rows in the batch
  4. Handle type coercion: Partition values may need coercion to match the canonical partition type (Java uses StructProjection).

  5. Extend RecordBatchTransformer to support struct-typed constants. Currently create_primitive_array_repeated only handles primitives -- need equivalent struct array construction.

Willingness to contribute

I can contribute to this feature independently

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions