Skip to content

Extensible pagination token implementation #1938

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

snazy
Copy link
Member

@snazy snazy commented Jun 25, 2025

Based on #1838, following up on #1555

  • Allows multiple implementations of Token referencing the "next page", encapsulated in PageToken. No changes to polaris-core needed to add custom Token implementations.
  • Extensible to (later) support (cryptographic) signatures to prevent tampered page-token
  • Refactor pagination code to delineate API-level page tokens and internal "pointers to data"
  • Requests deal with the "previous" token, user-provided page size (optional) and the previous request's page size.
  • Concentrate the logic of combining page size requests and previous tokens in PageTokenUtil
  • PageToken subclasses are no longer necessary.
  • Serialzation of PageToken uses Jackson serialization (smile format)

Since no (metastore level) implementation handling pagination existed before, no backwards compatibility is needed.

@snazy
Copy link
Member Author

snazy commented Jun 25, 2025

This approach also works for NoSQL #1189 (last commit in this branch)

@dimas-b dimas-b requested a review from eric-maynard June 25, 2025 16:25
}

@JsonCreator
private EntitiesResult(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can avoid it, I think it's better not to change this. This will be a breaking change for any persistence implementation depending on wire-compatibility of the result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @dennishuo for visibility as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a private constructor... @eric-maynard @dennishuo : could you give some more details about possible compatibility issues? What's the affected workflow?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see any roundtrips through JSON for EntitiesResult in Apache Polaris code. If this is a use case downstream, it would be nice to have tests in this repo to assert the expected behaviour.

That said, I think this specific change is not critical to pagination. EntitiesResult is not involved handling paginated data. So, I suppose this change can be rolled back.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see a "wire" either.
BTW: The same change wasn't considered an issue in 1838.

}

@JsonCreator
private ListEntitiesResult(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto the comment on wire-compatibility with previous versions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eric-maynard : It's good that you're able to identify this change as a regression, but what if you're not available ? :) WDYT about encoding this expectation in a unit test?

public interface EntityIdToken extends Token {
String ID = "e";

@JsonProperty("i")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we use the property name i?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe short names are meant to reduce the side of the token on the wire.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as explained in Token

long entityId();

@Override
default String getT() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is explained as doc in Token

@JsonSerialize(as = ImmutableEntityIdToken.class)
@JsonDeserialize(as = ImmutableEntityIdToken.class)
public interface EntityIdToken extends Token {
String ID = "e";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to be short for EntityId? Why not use an enum?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That contradicts the intent of being extensible.


// Signal "no more data" if the number of items is less than the requested page size or if
// there is no more data.
if (data.size() < limit || !it.hasNext()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be expected that the iterator has no more items if the limit has been pushed down into whatever provided the items, no?

Copy link
Contributor

@dimas-b dimas-b Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! It also looks like we do not push limits down to the database.

I propose to remove !it.hasNext() for the sake of simplicity, accept possible empty last pages, and work on exact page boundaries later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a test for this - and it passes

Copy link
Contributor

@dimas-b dimas-b Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether the test passes because limits are not pushed down to Persistence queries anywhere 🤔

* servicing the request for the next page of related data.
*/
public @Nullable String encodedResponseToken() {
return PageTokenUtil.encodePageToken(request, nextToken);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments from the previous PR -- encoding/decoding a given PageToken clearly seems to be within the purview of a given PageToken implementation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is only one PageToken.

Comment on lines +30 to +32
@PolarisImmutable
@JsonSerialize(as = ImmutablePageToken.class)
@JsonDeserialize(as = ImmutablePageToken.class)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would rather avoid these if possible

return build(null, pageSize);
}
/** The requested page size (optional). */
@JsonProperty("p")
Copy link
Contributor

@eric-maynard eric-maynard Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto the comment on i -- this is not a good property name.

*/
protected abstract PageToken updated(List<?> newData);
@JsonProperty("v")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

*/
protected abstract PageToken updated(List<?> newData);
@JsonProperty("v")
Optional<Token> value();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is value? This seems a bit abstract.

}
/** Convenience for {@code pageSize().isPresent()}. */
default boolean paginationRequested() {
return pageSize().isPresent();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible per the spec to request pagination by providing a page token without page size. This logic is not correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic looks correct to me. Propagating page size from the previous token in handled in PageTokenUtil.decodePageRequest()


/** Represents a non-paginated request. */
static PageToken readEverything() {
return PageTokenUtil.READ_EVERYTHING;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is READ_EVERYTHING a Util? Seems like it would belong in PageToken

@eric-maynard
Copy link
Contributor

Thanks for taking a crack at this. I appreciate the extra extensibility compared to #1838, but considering this diff is twice the size of #1555, I'm not sure there's enough functionality here to justify all the changes. I took a quick pass for now, but will try to circle back and review the rest of the PR in more detail later.

Comment on lines +377 to +385
data = data.sorted(Comparator.comparingLong(PolarisEntityCore::getId)).filter(tokenFilter);

data = data.filter(entityFilter);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why break this into two calls?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think it came from my PR) No particular reason. Would you prefer a chained call?

Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Token$TokenType looks awkward as a file name ($ and double Token)... WDYT about making TokenType a top-level class?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the class name.
I'd really prefer keeping these very related things (Token + TokenType) together - you have to implement both.

@snazy
Copy link
Member Author

snazy commented Jun 26, 2025

This change enables pagination also for #1189, whereas #1555 does not.

@snazy snazy force-pushed the pagination-alt-flex branch 3 times, most recently from 191d99d to 6cdb73c Compare July 1, 2025 04:56
@snazy snazy force-pushed the pagination-alt-flex branch 3 times, most recently from d072060 to 88ae32d Compare July 7, 2025 08:49
@snazy snazy force-pushed the pagination-alt-flex branch 2 times, most recently from 37eb5c3 to b472eef Compare July 11, 2025 10:41
Based on apache#1838, following up on apache#1555

* Allows multiple implementations of `Token` referencing the "next page", encapsulated in `PageToken`. No changes to `polaris-core` needed to add custom `Token` implementations.
* Extensible to (later) support (cryptographic) signatures to prevent tampered page-token
* Refactor pagination code to delineate API-level page tokens and internal "pointers to data"
* Requests deal with the "previous" token, user-provided page size (optional) and the previous request's page size.
* Concentrate the logic of combining page size requests and previous tokens in `PageTokenUtil`
* `PageToken` subclasses are no longer necessary.
* Serialzation of `PageToken` uses Jackson serialization (smile format)

Since no (metastore level) implementation handling pagination existed before, no backwards compatibility is needed.

Co-authored-by: Dmitri Bourlatchkov <[email protected]>
Co-authored-by: Eric Maynard <[email protected]>
@snazy snazy force-pushed the pagination-alt-flex branch from b472eef to edb7cc4 Compare July 12, 2025 12:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants