-
Notifications
You must be signed in to change notification settings - Fork 271
Extensible pagination token implementation #1938
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
12ab744
to
1b42120
Compare
This approach also works for NoSQL #1189 (last commit in this branch) |
} | ||
|
||
@JsonCreator | ||
private EntitiesResult( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can avoid it, I think it's better not to change this. This will be a breaking change for any persistence implementation depending on wire-compatibility of the result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @dennishuo for visibility as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a private
constructor... @eric-maynard @dennishuo : could you give some more details about possible compatibility issues? What's the affected workflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see any roundtrips through JSON for EntitiesResult
in Apache Polaris code. If this is a use case downstream, it would be nice to have tests in this repo to assert the expected behaviour.
That said, I think this specific change is not critical to pagination. EntitiesResult
is not involved handling paginated data. So, I suppose this change can be rolled back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see a "wire" either.
BTW: The same change wasn't considered an issue in 1838.
} | ||
|
||
@JsonCreator | ||
private ListEntitiesResult( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto the comment on wire-compatibility with previous versions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eric-maynard : It's good that you're able to identify this change as a regression, but what if you're not available ? :) WDYT about encoding this expectation in a unit test?
public interface EntityIdToken extends Token { | ||
String ID = "e"; | ||
|
||
@JsonProperty("i") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we use the property name i
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe short names are meant to reduce the side of the token on the wire.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, as explained in Token
long entityId(); | ||
|
||
@Override | ||
default String getT() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getT
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is explained as doc in Token
@JsonSerialize(as = ImmutableEntityIdToken.class) | ||
@JsonDeserialize(as = ImmutableEntityIdToken.class) | ||
public interface EntityIdToken extends Token { | ||
String ID = "e"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this meant to be short for EntityId
? Why not use an enum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That contradicts the intent of being extensible.
|
||
// Signal "no more data" if the number of items is less than the requested page size or if | ||
// there is no more data. | ||
if (data.size() < limit || !it.hasNext()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be expected that the iterator has no more items if the limit has been pushed down into whatever provided the items
, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! It also looks like we do not push limits down to the database.
I propose to remove !it.hasNext()
for the sake of simplicity, accept possible empty last pages, and work on exact page boundaries later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a test for this - and it passes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether the test passes because limits are not pushed down to Persistence queries anywhere 🤔
* servicing the request for the next page of related data. | ||
*/ | ||
public @Nullable String encodedResponseToken() { | ||
return PageTokenUtil.encodePageToken(request, nextToken); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments from the previous PR -- encoding/decoding a given PageToken clearly seems to be within the purview of a given PageToken implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is only one PageToken
.
@PolarisImmutable | ||
@JsonSerialize(as = ImmutablePageToken.class) | ||
@JsonDeserialize(as = ImmutablePageToken.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would rather avoid these if possible
return build(null, pageSize); | ||
} | ||
/** The requested page size (optional). */ | ||
@JsonProperty("p") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto the comment on i
-- this is not a good property name.
*/ | ||
protected abstract PageToken updated(List<?> newData); | ||
@JsonProperty("v") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
*/ | ||
protected abstract PageToken updated(List<?> newData); | ||
@JsonProperty("v") | ||
Optional<Token> value(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is value
? This seems a bit abstract.
} | ||
/** Convenience for {@code pageSize().isPresent()}. */ | ||
default boolean paginationRequested() { | ||
return pageSize().isPresent(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible per the spec to request pagination by providing a page token without page size. This logic is not correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic looks correct to me. Propagating page size from the previous token in handled in PageTokenUtil.decodePageRequest()
|
||
/** Represents a non-paginated request. */ | ||
static PageToken readEverything() { | ||
return PageTokenUtil.READ_EVERYTHING; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is READ_EVERYTHING
a Util
? Seems like it would belong in PageToken
Thanks for taking a crack at this. I appreciate the extra extensibility compared to #1838, but considering this diff is twice the size of #1555, I'm not sure there's enough functionality here to justify all the changes. I took a quick pass for now, but will try to circle back and review the rest of the PR in more detail later. |
data = data.sorted(Comparator.comparingLong(PolarisEntityCore::getId)).filter(tokenFilter); | ||
|
||
data = data.filter(entityFilter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why break this into two calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I think it came from my PR) No particular reason. Would you prefer a chained call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Token$TokenType
looks awkward as a file name ($
and double Token
)... WDYT about making TokenType
a top-level class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the class name.
I'd really prefer keeping these very related things (Token
+ TokenType
) together - you have to implement both.
polaris-core/src/main/java/org/apache/polaris/core/persistence/pagination/PageToken.java
Show resolved
Hide resolved
191d99d
to
6cdb73c
Compare
d072060
to
88ae32d
Compare
37eb5c3
to
b472eef
Compare
Based on apache#1838, following up on apache#1555 * Allows multiple implementations of `Token` referencing the "next page", encapsulated in `PageToken`. No changes to `polaris-core` needed to add custom `Token` implementations. * Extensible to (later) support (cryptographic) signatures to prevent tampered page-token * Refactor pagination code to delineate API-level page tokens and internal "pointers to data" * Requests deal with the "previous" token, user-provided page size (optional) and the previous request's page size. * Concentrate the logic of combining page size requests and previous tokens in `PageTokenUtil` * `PageToken` subclasses are no longer necessary. * Serialzation of `PageToken` uses Jackson serialization (smile format) Since no (metastore level) implementation handling pagination existed before, no backwards compatibility is needed. Co-authored-by: Dmitri Bourlatchkov <[email protected]> Co-authored-by: Eric Maynard <[email protected]>
b472eef
to
edb7cc4
Compare
Based on #1838, following up on #1555
Token
referencing the "next page", encapsulated inPageToken
. No changes topolaris-core
needed to add customToken
implementations.PageTokenUtil
PageToken
subclasses are no longer necessary.PageToken
uses Jackson serialization (smile format)Since no (metastore level) implementation handling pagination existed before, no backwards compatibility is needed.