Skip to content

Type casting error when reading files persisted with old schema for complex type #2617

@jordepic

Description

@jordepic

Apache Iceberg Rust version

None

Describe the bug

When a table's struct (or nested list/map) column has gained fields over time via schema evolution, reading data files written under the older schema fails with an Arrow cast error such as Cast error: Casting from Utf8 to Struct(...). The record-batch transformer reconciles a file's nested children to the table schema by position within the struct rather than by Iceberg field id, so once a nested struct adds a field, the children no longer line up and a mismatched cast is attempted (e.g. casting a string child into a struct slot). Files are valid and readable by Iceberg-Java/Spark.

e.g. struct goes from a, c to a, b, c -> when reading old file with only a, c it tries to cast c to type of b

To Reproduce

  1. Create a table with a column s struct<a: string> (plus other columns).
  2. Write a data file.
  3. Evolve the schema to s struct<a: string, b: long> (add a nested field), keeping field ids stable.
  4. Read the table (the older file still has s with only a).
  5. The scan errors with Cast error: Casting from Utf8 to Struct(...).

Expected behavior

Nested struct/list/map children are reconciled to the table schema by field id (recursively), and fields present in the table schema but absent from the file are materialized as typed NULLs — matching Iceberg's column-projection-by-id semantics. The read should succeed.

Willingness to contribute

I can contribute a fix for this bug independently

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions