Is your feature request related to a problem or challenge?
These columns are required for row-level operations (DELETE, UPDATE, MERGE) in both copy-on-write and merge-on-read strategies:
| Column |
CoW |
MoR |
Role |
_file |
required |
row identity |
Identifies the source data file |
_pos |
required |
row identity |
Identifies the row within the file |
_spec_id |
-- |
required |
Partition spec ID for delta writes |
_partition |
-- |
required |
Partition values for delta writes |
Without these, query engines cannot implement row-level mutations against Iceberg tables via iceberg-rust.
In iceberg-java, Spark's row-level operation API requires these columns:
- Copy-on-Write (
SparkCopyOnWriteOperation.requiredMetadataAttributes()): requests _file + _pos to identify which rows to rewrite during DELETE/UPDATE.
- Merge-on-Read (
SparkPositionDeltaOperation): uses _file + _pos as rowId() to uniquely identify rows, and requests _spec_id + _partition via requiredMetadataAttributes() so the delta writer knows which partition to write position delete files into.
This pattern is consistent across iceberg-java v3.4 through v4.1. Any query engine (DataFusion Comet, etc.) building row-level operations on top of iceberg-rust will need these columns.
Describe the solution you'd like
1. _spec_id metadata column
Description: Constant per FileScanTask -- same pattern as _file.
Changes:
- Add
spec_id: i32 field to FileScanTask in scan/task.rs, populated from the manifest entry's partition_spec_id during scan planning
- In
pipeline.rs, inject as constant when projected:
if task.project_field_ids().contains(&RESERVED_FIELD_ID_SPEC_ID) {
let spec_id_datum = Datum::int(task.spec_id);
builder = builder.with_constant(RESERVED_FIELD_ID_SPEC_ID, spec_id_datum);
}
- Add tests following the existing
_file column test patterns
2. _pos metadata column
Description: The ordinal row position (0-based) within the source data file. Unlike _file and _spec_id, this is NOT a constant -- it increases monotonically across batches within a file.
Changes:
- New
ColumnSource variant in RecordBatchTransformer (e.g., RowPosition) that generates sequential Int64Array values
- Mutable state in the transformer tracking the row offset across batches within a file. After each batch of N rows,
start_offset += N.
- Handle split reads: if
FileScanTask reads a portion of a file, the initial position offset must account for rows before the split (from Parquet row group's row index offset)
- In
pipeline.rs, detect RESERVED_FIELD_ID_POS in projected fields and configure the transformer accordingly
- Must use the same 0-based numbering semantics as positional delete files
Design considerations:
- iceberg-rust currently handles positional deletes via a separate
DeleteVector/RowSelection pre-filtering mechanism. The _pos column is architecturally independent but must agree on numbering.
- In Java,
PositionVectorReader gets rowStart from PageReadStore.getRowIndexOffset() per row group, then fills [rowStart, rowStart+1, ..., rowStart+N-1] per batch.
3. _partition metadata column
Description: A struct column whose type is the union of all partition fields across all partition specs in the table (to handle partition evolution). Each row gets the partition values for its data file, with nulls for fields from other specs.
Changes:
-
Compute the table-level partition type as a union struct of all partition fields across all specs. Equivalent to Java's Partitioning.partitionType(table). The function partition_field() in metadata_columns.rs already constructs a struct from partition fields -- may need a helper that collects fields across all specs.
-
Propagate the unified partition type to FileScanTask so each task knows the full struct schema and which fields to null-fill for its specific spec.
-
Materialize as a constant StructArray per batch:
- Fields present in this file's partition spec get their values from partition data
- Fields from other specs (partition evolution) get null
- The struct is repeated (constant) for all rows in the batch
-
Handle type coercion: Partition values may need coercion to match the canonical partition type (Java uses StructProjection).
-
Extend RecordBatchTransformer to support struct-typed constants. Currently create_primitive_array_repeated only handles primitives -- need equivalent struct array construction.
Willingness to contribute
I can contribute to this feature independently
Is your feature request related to a problem or challenge?
These columns are required for row-level operations (DELETE, UPDATE, MERGE) in both copy-on-write and merge-on-read strategies:
_file_pos_spec_id_partitionWithout these, query engines cannot implement row-level mutations against Iceberg tables via iceberg-rust.
In iceberg-java, Spark's row-level operation API requires these columns:
SparkCopyOnWriteOperation.requiredMetadataAttributes()): requests_file+_posto identify which rows to rewrite during DELETE/UPDATE.SparkPositionDeltaOperation): uses_file+_posasrowId()to uniquely identify rows, and requests_spec_id+_partitionviarequiredMetadataAttributes()so the delta writer knows which partition to write position delete files into.This pattern is consistent across iceberg-java v3.4 through v4.1. Any query engine (DataFusion Comet, etc.) building row-level operations on top of iceberg-rust will need these columns.
Describe the solution you'd like
1.
_spec_idmetadata columnDescription: Constant per
FileScanTask-- same pattern as_file.Changes:
spec_id: i32field toFileScanTaskinscan/task.rs, populated from the manifest entry'spartition_spec_idduring scan planningpipeline.rs, inject as constant when projected:_filecolumn test patterns2.
_posmetadata columnDescription: The ordinal row position (0-based) within the source data file. Unlike
_fileand_spec_id, this is NOT a constant -- it increases monotonically across batches within a file.Changes:
ColumnSourcevariant inRecordBatchTransformer(e.g.,RowPosition) that generates sequentialInt64Arrayvaluesstart_offset += N.FileScanTaskreads a portion of a file, the initial position offset must account for rows before the split (from Parquet row group's row index offset)pipeline.rs, detectRESERVED_FIELD_ID_POSin projected fields and configure the transformer accordinglyDesign considerations:
DeleteVector/RowSelectionpre-filtering mechanism. The_poscolumn is architecturally independent but must agree on numbering.PositionVectorReadergetsrowStartfromPageReadStore.getRowIndexOffset()per row group, then fills[rowStart, rowStart+1, ..., rowStart+N-1]per batch.3.
_partitionmetadata columnDescription: A struct column whose type is the union of all partition fields across all partition specs in the table (to handle partition evolution). Each row gets the partition values for its data file, with nulls for fields from other specs.
Changes:
Compute the table-level partition type as a union struct of all partition fields across all specs. Equivalent to Java's
Partitioning.partitionType(table). The functionpartition_field()inmetadata_columns.rsalready constructs a struct from partition fields -- may need a helper that collects fields across all specs.Propagate the unified partition type to
FileScanTaskso each task knows the full struct schema and which fields to null-fill for its specific spec.Materialize as a constant
StructArrayper batch:Handle type coercion: Partition values may need coercion to match the canonical partition type (Java uses
StructProjection).Extend
RecordBatchTransformerto support struct-typed constants. Currentlycreate_primitive_array_repeatedonly handles primitives -- need equivalent struct array construction.Willingness to contribute
I can contribute to this feature independently