Skip to content

HIVE-29016: Disable caching on the Iceberg REST Catalog #5871

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

okumin
Copy link
Contributor

@okumin okumin commented Jun 14, 2025

What changes were proposed in this pull request?

Remove caching on the Iceberg REST API server. It's mainly because the current code didn't work with caching correctly, as stated in HIVE-29016.

Also, I believe the REST Catalog service is not as mature as considering caching. Caching objects without proper consideration can be harmful, such as when users update an Iceberg table with both Thrift and REST APIs in parallel.

Why are the changes needed?

Remove caching.

Does this PR introduce any user-facing change?

No. This feature is not yet shipped.

How was this patch tested?

Unit test

@okumin okumin changed the title [WIP] HIVE-29016: Disable caching on the Iceberg REST Catalog HIVE-29016: Disable caching on the Iceberg REST Catalog Jun 14, 2025
@okumin okumin marked this pull request as ready for review June 14, 2025 16:08
Copy link
Member

@deniskuzZ deniskuzZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link

@okumin
Copy link
Contributor Author

okumin commented Jun 16, 2025

@deniskuzZ If the +1 stands for the approval, I will merge this pull request

@deniskuzZ
Copy link
Member

@okumin, I'd like to loop in @henrib and @dengzhhu653 to reach a consensus

@henrib
Copy link
Contributor

henrib commented Jun 16, 2025

@okumin, I'd like to loop in @henrib and @dengzhhu653 to reach a consensus

Thanks @deniskuzZ :-)
Before we remove the code, doesn't disabling caching by setting the property hive.metastore.catalog.cache.expiry=-1 a valid workaround for the initial 'Trino creates table' issue ? I'd much rather having caching work by improving tx handling - and since we are in HMS, observe and invalidate cache entries in case of changes - than assume caching is not needed (loading these manifests is not without latency and IO rate can be quite demanding with many concurrent users).

May be worth trying overriding HMSCachingCatalog:buildTable():
@Override public Catalog.TableBuilder buildTable(TableIdentifier identifier, Schema schema) { super.tableCache.invalidate(identifier); return nsCatalog.buildTable(identifier, schema); }

@deniskuzZ
Copy link
Member

deniskuzZ commented Jun 16, 2025

good idea with setting the property hive.metastore.catalog.cache.expiry=-1. This we can merge quickly

@okumin
Copy link
Contributor Author

okumin commented Jun 17, 2025

@henrib What client did you use while testing the feature?
I suppose it is not a Trino issue but the Hive's REST catalog issue, which means "hive.metastore.catalog.cache.expiry > 0" does not work with another system such as Spark. If my understanding is correct, I suppose we don't need to keep the current implementation. When caching is really necessary, we will implement a new mechanism with robust integration tests, and add the configuration back. I assume adding something is easy and removing or renaming exinsting ones are relatively hard

@henrib
Copy link
Contributor

henrib commented Jun 17, 2025

@okumin , setting the property hive.metastore.catalog.cache.expiry=-1 in the configuration disables the cache which solves the issue of creating/updating tables through the catalog triggering an error. The actual coding solution to that error might just be overriding buildTable() as mentioned above. Removing the caching code will just make querying - which is expected to be the most common operation - slower and less efficient; caching is already necessary at scale.
If overriding buildTable() seems too complicated or not sufficient, changing the default value for the property is enough to solve the issue for users updating tables through catalog without impeding querying usage for the most common case.

@okumin
Copy link
Contributor Author

okumin commented Jun 17, 2025

OK. Although I haven't tested the proposed solution, it's not a bad idea to disable it at this point. Is this expected?
#5878

@okumin okumin closed this Jun 17, 2025
@okumin okumin deleted the HIVE-29016-rest-cache branch June 17, 2025 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants