Search before asking
What would you like to be improved?
正文草稿
Hi Amoro community,
We found two compatibility issues between Amoro Iceberg REST Catalog and Trino Iceberg connector.
These issues make Trino DDL unreliable: `CREATE TABLE` may return an error to the client, while the table is actually created successfully in Amoro and can be queried afterward.
## Environment
- Amoro: 0.6.1
- Object storage: MinIO / S3-compatible storage
- Query engine: Trino
- Catalog type: Iceberg REST Catalog
- Amoro REST Catalog URI:
```properties
iceberg.rest-catalog.uri=http://<amoro-host>:1630/api/iceberg/rest
iceberg.rest-catalog.warehouse=lakehouse_catalog
Trino catalog config example:
connector.name=iceberg
iceberg.catalog.type=rest
iceberg.rest-catalog.uri=http://<amoro-host>:1630/api/iceberg/rest
iceberg.rest-catalog.warehouse=lakehouse_catalog
iceberg.unique-table-location=true
iceberg.file-format=PARQUET
fs.native-s3.enabled=true
s3.endpoint=http://<minio-host>:9000
s3.region=us-east-1
s3.path-style-access=true
s3.aws-access-key=***
s3.aws-secret-key=***
Issue 1: namespace properties are not preserved or returned
When creating a namespace with location:
CREATE SCHEMA iceberg_minio.dwd_stage_v2
WITH (
location = 's3://enerpoch-pub-bucket/warehouse/dwd_stage_v2'
);
or calling the Iceberg REST API directly:
curl -s -X POST \
"http://<amoro-host>:1630/api/iceberg/rest/v1/catalogs/lakehouse_catalog/namespaces" \
-H "Content-Type: application/json" \
-d '{
"namespace": ["diag_stage_loc"],
"properties": {
"location": "s3://enerpoch-pub-bucket/warehouse/diag_stage_loc"
}
}'
Amoro returns:
{
"namespace": ["diag_stage_loc"],
"properties": {}
}
Loading the namespace also returns empty properties:
curl -s \
"http://<amoro-host>:1630/api/iceberg/rest/v1/catalogs/lakehouse_catalog/namespaces/diag_stage_loc"
Response:
{
"namespace": ["diag_stage_loc"],
"properties": {}
}
Because of this, Trino cannot create a table without an explicit table-level location:
CREATE TABLE iceberg_minio.diag_stage_loc.test_tbl (
id BIGINT
)
WITH (
format = 'PARQUET'
);
Trino reports:
location must be set for diag_stage_loc
Expected behavior:
Amoro should preserve and return namespace properties, at least the location property:
{
"namespace": ["diag_stage_loc"],
"properties": {
"location": "s3://enerpoch-pub-bucket/warehouse/diag_stage_loc"
}
}
Issue 2: CREATE TABLE with explicit location returns non-empty location error, but the table is created successfully
When creating a table with an explicit location:
CREATE TABLE iceberg_minio.dwd_stage.aaa (
id BIGINT,
name VARCHAR,
create_time TIMESTAMP(6)
)
WITH (
format = 'PARQUET',
location = 's3://enerpoch-pub-bucket/warehouse/dwd_stage/aaa'
);
Trino reports:
Cannot create a table on a non-empty location:
s3://enerpoch-pub-bucket/warehouse/dwd_stage/aaa,
set 'iceberg.unique-table-location=true' in your Iceberg catalog properties
to use unique table locations for every table.
However, the table is actually created successfully:
- The table exists in Amoro.
- Iceberg metadata files exist in object storage.
- Trino can query the table afterward.
So the DDL result seen by the client is failure, while the server-side result is success.
Source code analysis
Amoro 0.6.1
In Amoro 0.6.1, IcebergRestCatalogService#createTable creates Iceberg metadata and then registers the table in AMS.
Related files:
ams/server/src/main/java/com/netease/arctic/server/IcebergRestCatalogService.java
ams/server/src/main/java/com/netease/arctic/server/utils/IcebergTableUtil.java
The create table flow writes the initial metadata file:
OutputFile outputFile = io.newOutputFile(metadataFileLocation);
TableMetadataParser.overwrite(icebergTableMetadata, outputFile);
The generated metadata path is under the final table location:
<table-location>/metadata/v00001-xxx.metadata.json
After the REST call returns, Trino checks whether the table location is empty. Since Amoro has already written the metadata file, Trino sees the location as non-empty and throws the error.
Higher versions checked
We also checked the official Apache Amoro source code for:
v0.8.1-incubating
v0.9.0-rc3
- current
master
The namespace properties issue still exists. setNamespaceProperties still throws:
throw new UnsupportedOperationException("namespace properties is not supported");
The create table path was refactored to InternalTableCreator, but the behavior is still similar.
Related file:
amoro-ams/src/main/java/org/apache/amoro/server/table/internal/InternalIcebergCreator.java
The initial metadata is still written to the final table location:
String icebergMetadataFileLocation =
InternalTableUtil.genNewMetadataFileLocation(null, icebergMetadata);
OutputFile outputFile = io.newOutputFile(icebergMetadataFileLocation);
TableMetadataParser.overwrite(icebergMetadata, outputFile);
We did not find special handling for:
Therefore, upgrading from 0.6.1 to 0.8.1 or 0.9.0-rc3 does not appear to resolve this Trino compatibility issue.
Why this seems better fixed in Amoro
Trino's non-empty location check is a safety check to avoid creating a table on top of existing files or existing Iceberg metadata.
The problem is that Amoro writes metadata into the final table location before Trino finishes its create table flow. Therefore, relaxing Trino's check would reduce safety for all catalogs, while fixing Amoro would make the REST Catalog behavior more compatible and predictable.
How should we improve?
Suggested improvements
Could Amoro consider supporting the following behavior?
- Preserve namespace properties passed by
CreateNamespaceRequest.properties().
- Return namespace properties from the load namespace endpoint.
- Implement the namespace properties update endpoint.
- Respect Iceberg REST
CreateTableRequest.stageCreate() semantics.
- Avoid writing initial metadata into the final table location during staged create.
- Ensure create table is transactional or at least idempotent from the REST client perspective.
Expected result:
CREATE SCHEMA ... WITH (location=...) should preserve location.
CREATE TABLE without explicit table-level location should work when namespace has a location.
CREATE TABLE ... WITH (location=...) should not return a failure after the table has already been created successfully.
Thanks.
Search before asking
What would you like to be improved?
正文草稿
Trino catalog config example:
Issue 1: namespace properties are not preserved or returned
When creating a namespace with
location:or calling the Iceberg REST API directly:
Amoro returns:
{ "namespace": ["diag_stage_loc"], "properties": {} }Loading the namespace also returns empty properties:
curl -s \ "http://<amoro-host>:1630/api/iceberg/rest/v1/catalogs/lakehouse_catalog/namespaces/diag_stage_loc"Response:
{ "namespace": ["diag_stage_loc"], "properties": {} }Because of this, Trino cannot create a table without an explicit table-level location:
Trino reports:
Expected behavior:
Amoro should preserve and return namespace properties, at least the
locationproperty:{ "namespace": ["diag_stage_loc"], "properties": { "location": "s3://enerpoch-pub-bucket/warehouse/diag_stage_loc" } }Issue 2: CREATE TABLE with explicit location returns non-empty location error, but the table is created successfully
When creating a table with an explicit location:
Trino reports:
However, the table is actually created successfully:
So the DDL result seen by the client is failure, while the server-side result is success.
Source code analysis
Amoro 0.6.1
In Amoro 0.6.1,
IcebergRestCatalogService#createTablecreates Iceberg metadata and then registers the table in AMS.Related files:
The create table flow writes the initial metadata file:
The generated metadata path is under the final table location:
After the REST call returns, Trino checks whether the table location is empty. Since Amoro has already written the metadata file, Trino sees the location as non-empty and throws the error.
Higher versions checked
We also checked the official Apache Amoro source code for:
v0.8.1-incubatingv0.9.0-rc3masterThe namespace properties issue still exists.
setNamespacePropertiesstill throws:The create table path was refactored to
InternalTableCreator, but the behavior is still similar.Related file:
The initial metadata is still written to the final table location:
We did not find special handling for:
Therefore, upgrading from 0.6.1 to 0.8.1 or 0.9.0-rc3 does not appear to resolve this Trino compatibility issue.
Why this seems better fixed in Amoro
Trino's non-empty location check is a safety check to avoid creating a table on top of existing files or existing Iceberg metadata.
The problem is that Amoro writes metadata into the final table location before Trino finishes its create table flow. Therefore, relaxing Trino's check would reduce safety for all catalogs, while fixing Amoro would make the REST Catalog behavior more compatible and predictable.
How should we improve?
Suggested improvements
Could Amoro consider supporting the following behavior?
CreateNamespaceRequest.properties().CreateTableRequest.stageCreate()semantics.Expected result:
CREATE SCHEMA ... WITH (location=...)should preservelocation.CREATE TABLEwithout explicit table-level location should work when namespace has a location.CREATE TABLE ... WITH (location=...)should not return a failure after the table has already been created successfully.Thanks.