Manifest reader hard-depends on the manifest `schema`/`partition-spec` keys; should derive from table metadata (cf. iceberg-java `specsById`)

## The iceberg-rust issue

When reading a manifest, `ManifestMetadata::parse` (`spec/manifest/metadata.rs`) parses the **table schema** and **partition spec** from that manifest's *own* `schema` / `partition-spec` Avro key-value metadata, and `Manifest::try_from_avro_bytes` then uses them to derive the partition type for decoding entries:

```rust
serde_json::from_slice::<Schema>(meta.get("schema"))?            // hard error if not a valid Iceberg schema
let partition_type = metadata.partition_spec.partition_type(&metadata.schema)?;
```

This has two problems, independent of any particular writer:

1. **Redundant dependency.** The scan already holds the authoritative `TableMetadata` (`ObjectCache::get_manifest_list` takes it, and `TableMetadata::{schema_by_id, partition_spec_by_id}` exist). A manifest's embedded `schema`/`partition-spec` is a redundant copy of what table metadata already provides by id.
2. **Out of step with the ecosystem.** Other implementations don't read the manifest's `schema` key on the scan path:
   - **iceberg-java** `ManifestReader` takes `specsById` (specs from table metadata); reading the schema from manifest file metadata is **deprecated and slated for removal (1.12.0)** — the warning is literally *"Pass specsById to avoid reading from file metadata."*
   - **pyiceberg** decodes via a fixed `MANIFEST_ENTRY_SCHEMAS` + the table-metadata schema.
   - **iceberg-go** decodes via the Avro writer schema.

So iceberg-rust is the only implementation that hard-depends on the manifest's self-described schema, which is both unnecessary and brittle.

## Observable impact (the symptom)

Because of this, manifests whose `schema` key holds anything other than a valid Iceberg table schema are unreadable in iceberg-rust **only** — pyiceberg, Apache Doris, and Spark (iceberg-java) all read the same tables. For example, **duckdb-iceberg** serializes the manifest_entry Avro record schema into the `schema` key (using Avro type names like `array`/`record`), so iceberg-rust fails with:

```
Fail to parse schema in manifest metadata
  → data did not match any variant of untagged enum SchemaEnum
```

This is the symptom; the root concern is the redundant, ecosystem-divergent dependency above.

## Proposed fix

Derive the schema + partition spec from the **table metadata** (by the manifest's `schema-id` / `partition-spec-id`) rather than the manifest's own keys, falling back to the manifest metadata when no table metadata is available — mirroring iceberg-java's `ManifestReader(specsById)`. The scan already has the `TableMetadataRef` to thread down. PR opening shortly.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manifest reader hard-depends on the manifest `schema`/`partition-spec` keys; should derive from table metadata (cf. iceberg-java `specsById`) #2682

The iceberg-rust issue

Observable impact (the symptom)

Proposed fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Manifest reader hard-depends on the manifest schema/partition-spec keys; should derive from table metadata (cf. iceberg-java specsById) #2682

Description

The iceberg-rust issue

Observable impact (the symptom)

Proposed fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Manifest reader hard-depends on the manifest `schema`/`partition-spec` keys; should derive from table metadata (cf. iceberg-java `specsById`) #2682