Apache Iceberg version
1.11.0 (latest release)
Query engine
Trino
Please describe the bug 🐞
When using rewrite_table_path to migrate an Iceberg table between clusters (e.g., different HDFS prefixes), the rewritten manifest list files contain stale manifest_length values. Specifically, RewriteTablePathUtil.rewriteManifestList() copies each ManifestFile entry and only updates field 0 (the path), leaving field 1 (the length) unchanged from the original:
// core/src/main/java/org/apache/iceberg/RewriteTablePathUtil.java
for (ManifestFile file : manifestFiles) {
ManifestFile newFile = file.copy();
((StructLike) newFile).set(0, newPath(newFile.path(), sourcePrefix, targetPrefix));
writer.add(newFile); // length (field 1) is NOT updated
...
}
The rewritten manifest .avro files have new path strings embedded inside them (data file paths with the new prefix). If the target prefix is longer than the source prefix, the rewritten manifest files are physically larger on disk than what manifest_length records in the manifest list. The Iceberg spec requires manifest_length (field 501) to reflect the actual byte length of the manifest file.
Trino's Iceberg connector enforces this field strictly and fails with:
Incorrect file size for file <manifest.avro> (end of stream not reached)
Spark does not validate manifest_length and succeeds.
To Reproduce
- Have an Iceberg table on cluster A with prefix
hdfs://cluster-a/warehouse/.
- Run
rewrite_table_path with source_prefix = hdfs://cluster-a/warehouse/ and target_prefix = hdfs://cluster-b/longer-warehouse-path/ (target prefix is longer).
distcp the staged files to cluster B and register the table.
- Query the table (any snapshot) with Trino → fails with
Incorrect file size for file (end of stream not reached).
- Query the same table with Spark → succeeds.
Additional context: rewrite_manifests does not fix historical snapshots
Running rewrite_manifests after the migration does not resolve the issue for historical snapshots. BaseRewriteManifests.apply() only operates on base.currentSnapshot(), so it creates a new snapshot with correct manifest sizes, but all prior snapshots still point to manifest lists with stale manifest_length values. Querying any old snapshot via Trino will continue to fail.
Environment
- Apache Iceberg version: 1.11.0
- Query engine: Trino (fails), Spark (succeeds)
- Source/target filesystem: HDFS → HDFS (different clusters)
Willingness to contribute
Apache Iceberg version
1.11.0 (latest release)
Query engine
Trino
Please describe the bug 🐞
When using
rewrite_table_pathto migrate an Iceberg table between clusters (e.g., different HDFS prefixes), the rewritten manifest list files contain stalemanifest_lengthvalues. Specifically,RewriteTablePathUtil.rewriteManifestList()copies eachManifestFileentry and only updates field0(the path), leaving field1(thelength) unchanged from the original:The rewritten manifest
.avrofiles have new path strings embedded inside them (data file paths with the new prefix). If the target prefix is longer than the source prefix, the rewritten manifest files are physically larger on disk than whatmanifest_lengthrecords in the manifest list. The Iceberg spec requiresmanifest_length(field501) to reflect the actual byte length of the manifest file.Trino's Iceberg connector enforces this field strictly and fails with:
Spark does not validate
manifest_lengthand succeeds.To Reproduce
hdfs://cluster-a/warehouse/.rewrite_table_pathwithsource_prefix = hdfs://cluster-a/warehouse/andtarget_prefix = hdfs://cluster-b/longer-warehouse-path/(target prefix is longer).distcpthe staged files to cluster B and register the table.Incorrect file size for file (end of stream not reached).Additional context:
rewrite_manifestsdoes not fix historical snapshotsRunning
rewrite_manifestsafter the migration does not resolve the issue for historical snapshots.BaseRewriteManifests.apply()only operates onbase.currentSnapshot(), so it creates a new snapshot with correct manifest sizes, but all prior snapshots still point to manifest lists with stalemanifest_lengthvalues. Querying any old snapshot via Trino will continue to fail.Environment
Willingness to contribute