Remove unnecessary replication calls & make them retry on different instances #18564
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This should be reviewed commit by commit.
Nowadays it's trivial to propagate cache invalidations, which means we can move some things off the main process, and not go through HTTP replication.
ReplicationGetQueryRestServlet
appeared to be unused, and was very weird, as it was being called if the current instance is the main one… to RPC to the main one (if no instance is set on a replication client, it makes it to the main process)The other two handlers could be relatively trivially moved to any workers, moving some methods to the worker store.
Then the main feature here is that retries are done with a round-robin across writer instances. This means that if a writer is temporarily down, it will try on another (potentially healthy) one.
I've intentionally not removed the replication servlets yet so that it's safe to rollout, and will do another PR that clean those up to remove on the N+1 version