Skip to content

Add pathStyleAccess to AwsStorageConfigInfo #2012

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

dimas-b
Copy link
Contributor

@dimas-b dimas-b commented Jul 8, 2025

This change allows configuring the "path-style" access mode in S3 clients (both in Polaris Servers and Iceberg REST Catalog API clients).

This change is applicable both to AWS storage and to non-AWS S3-compatible storage (#1530).

The Management API change is backward-compatible.

Includes a fix to AWS config object round-trip via generated API classes.

Dev ML discussion: https://lists.apache.org/thread/oxy2yd4mbq06zr4z57lfptgy95c3lyhb

}

public AwsStorageConfigurationInfo(
@Nonnull StorageType storageType,
@Nonnull List<String> allowedLocations,
@Nonnull String roleARN,
@Nullable String region) {
this(storageType, allowedLocations, roleARN, null, region, null, null);
this(storageType, allowedLocations, roleARN, null, region, null, null, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use the default here, or just leave it null to leverage the default used in orElse. Better not to proliferate the constants around like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to use PATH_STYLE_ACCESS_DEFAULT

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have a default at the API level, is this actually nullable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to remove the API default.

type: boolean
description: >-
Whether S3 requests to files in this catalog should use 'path-style addressing for buckets'.
Default: false.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe OpenAPI actually supports default values, maybe we could just use that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added default: false, changed example to true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I had to roll this back. Using Open API defaults breaks strict backward compatibility because is causes the attribute with a default value to be always returned to clients (even when not in use) - CI caught this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly was the failure in CI? The client would fail to parse a response because it didn't recognize the property?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CatalogSerializationTest detected a change in the JSON text (when the new properties are not set)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, never mind, I'll expand this PR a bit :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated. PTAL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the changes look better, but my concern about the default remains

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I do not understand the whole discussion about "is this a breaking change".

As Polaris is primarily an Iceberg REST catalog, we have to let JSON parsing ignore unknown properties, because that is a requirement from Iceberg REST. There is also exactly one JSON parser (ObjectMapper) for all endpoints. The situation is unfortunate but a consequence of generating the OpenAPI code. Jackson itself allows much more fine grained control over how types are (de)serialized and allows back/forwards compatibility targeting functionality like @JsonView - but that's not usable w/ the code generation approach.

So fact is that unknown properties in JSON objects in any Polaris REST payload have to be ignored. As that fact leaks to all Polaris APIs, it also leaks to clients. This means, that a new property is not a breaking change. If clients fail on unknown properties, it's a bug in those client.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eric-maynard : the new test shows AWS config data roundtrips via API with and without (defaults) values for the new properties.

If the new properties are not set, clients will not see them in JSON outputs of the REST API endpoints. So, this change does not break existing installations and old clients as long as Polaris users do not use the new properties. If users do reconfigure catalogs to use the new properties, it's the user's responsibility to also update clients to be compatible with the catalog JSON configuration (e.g. use new CLI).

snazy
snazy previously approved these changes Jul 9, 2025
@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Jul 9, 2025
singhpk234
singhpk234 previously approved these changes Jul 9, 2025
Copy link
Contributor

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@dimas-b
Copy link
Contributor Author

dimas-b commented Jul 9, 2025

I had to remove one commit (discussed in the API yaml comment thread). PTAL.

Copy link
Contributor

@eric-maynard eric-maynard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's clarify if this is a breaking change (does it need to target 2.0? Go into API version 2?) and plan accordingly before merging.

@github-project-automation github-project-automation bot moved this from Ready to merge to PRs In Progress in Basic Kanban Board Jul 10, 2025
snazy
snazy previously approved these changes Jul 10, 2025
Copy link
Member

@snazy snazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

dimas-b added 3 commits July 10, 2025 18:03
This change allows configuring the "path-style" access
mode in S3 clients (both in Polaris Servers and Iceberg
REST Catalog API clients).

This change is applicable both to AWS storage and to
non-AWS S3-compatible storage (apache#1530).

The Management API change is backward-compatible.
@dimas-b
Copy link
Contributor Author

dimas-b commented Jul 14, 2025

@eric-maynard : I believe I addressed your backward-compatibility concern in the latest push. Could you review again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants