The iceberg-rust issue
When reading a manifest, ManifestMetadata::parse (spec/manifest/metadata.rs) parses the table schema and partition spec from that manifest's own schema / partition-spec Avro key-value metadata, and Manifest::try_from_avro_bytes then uses them to derive the partition type for decoding entries:
serde_json::from_slice::<Schema>(meta.get("schema"))? // hard error if not a valid Iceberg schema
let partition_type = metadata.partition_spec.partition_type(&metadata.schema)?;
This has two problems, independent of any particular writer:
- Redundant dependency. The scan already holds the authoritative
TableMetadata (ObjectCache::get_manifest_list takes it, and TableMetadata::{schema_by_id, partition_spec_by_id} exist). A manifest's embedded schema/partition-spec is a redundant copy of what table metadata already provides by id.
- Out of step with the ecosystem. Other implementations don't read the manifest's
schema key on the scan path:
- iceberg-java
ManifestReader takes specsById (specs from table metadata); reading the schema from manifest file metadata is deprecated and slated for removal (1.12.0) — the warning is literally "Pass specsById to avoid reading from file metadata."
- pyiceberg decodes via a fixed
MANIFEST_ENTRY_SCHEMAS + the table-metadata schema.
- iceberg-go decodes via the Avro writer schema.
So iceberg-rust is the only implementation that hard-depends on the manifest's self-described schema, which is both unnecessary and brittle.
Observable impact (the symptom)
Because of this, manifests whose schema key holds anything other than a valid Iceberg table schema are unreadable in iceberg-rust only — pyiceberg, Apache Doris, and Spark (iceberg-java) all read the same tables. For example, duckdb-iceberg serializes the manifest_entry Avro record schema into the schema key (using Avro type names like array/record), so iceberg-rust fails with:
Fail to parse schema in manifest metadata
→ data did not match any variant of untagged enum SchemaEnum
This is the symptom; the root concern is the redundant, ecosystem-divergent dependency above.
Proposed fix
Derive the schema + partition spec from the table metadata (by the manifest's schema-id / partition-spec-id) rather than the manifest's own keys, falling back to the manifest metadata when no table metadata is available — mirroring iceberg-java's ManifestReader(specsById). The scan already has the TableMetadataRef to thread down. PR opening shortly.
The iceberg-rust issue
When reading a manifest,
ManifestMetadata::parse(spec/manifest/metadata.rs) parses the table schema and partition spec from that manifest's ownschema/partition-specAvro key-value metadata, andManifest::try_from_avro_bytesthen uses them to derive the partition type for decoding entries:This has two problems, independent of any particular writer:
TableMetadata(ObjectCache::get_manifest_listtakes it, andTableMetadata::{schema_by_id, partition_spec_by_id}exist). A manifest's embeddedschema/partition-specis a redundant copy of what table metadata already provides by id.schemakey on the scan path:ManifestReadertakesspecsById(specs from table metadata); reading the schema from manifest file metadata is deprecated and slated for removal (1.12.0) — the warning is literally "Pass specsById to avoid reading from file metadata."MANIFEST_ENTRY_SCHEMAS+ the table-metadata schema.So iceberg-rust is the only implementation that hard-depends on the manifest's self-described schema, which is both unnecessary and brittle.
Observable impact (the symptom)
Because of this, manifests whose
schemakey holds anything other than a valid Iceberg table schema are unreadable in iceberg-rust only — pyiceberg, Apache Doris, and Spark (iceberg-java) all read the same tables. For example, duckdb-iceberg serializes the manifest_entry Avro record schema into theschemakey (using Avro type names likearray/record), so iceberg-rust fails with:This is the symptom; the root concern is the redundant, ecosystem-divergent dependency above.
Proposed fix
Derive the schema + partition spec from the table metadata (by the manifest's
schema-id/partition-spec-id) rather than the manifest's own keys, falling back to the manifest metadata when no table metadata is available — mirroring iceberg-java'sManifestReader(specsById). The scan already has theTableMetadataRefto thread down. PR opening shortly.