Skip to content

[Feature Request] Add node-level hook to TransportBroadcastByNodeAction? #19183

@peteralfonsi

Description

@peteralfonsi

Is your feature request related to a problem? Please describe

TransportBroadcastByNodeAction is an abstract class that runs an action on shards (shardOperation() which is overridden by inheriting classes), and afterwards combines shard-level results on each node, and sends that NodeResponse back to the coordinating node.

I was thinking it might be useful to have a hook that would run once per node, after all that node's shard-level operations have completed, before sending the aggregated NodeResponse back.

This is motivated by TransportClearIndicesCacheAction (see #19118). Today in the request cache, each invocation of shardOperation() queues a cleanup key for that shard (marking it for removal in the next cleanup), and then immediately force cleans the node-cache, triggering a full iteration through all its keys. Since we do this once per shard on the node it can extend cleanup times with unnecessary iterating, especially if the disk tier is used (since its removal flow is much slower). It would be ideal if each shardOperation() only queued the cleanup key for that shard, and then we iterated a single time per node by force cleaning the cache during this node-level hook. I imagine this might be useful for other actions too.

Describe the solution you'd like

We could run the hook within TransportBroadcastByNodeAction.BroadcastByNodeTransportRequestHandler.messageReceived(), at line 481. This is right after shardOperation() has run for each shard, and all results/exceptions have been added to the list of results, but right before creating the NodeResponse and sending it back.

The method signature could be something like protected default nodeOperation(NodeRequest request, List<ShardOperationResult> results, List<BroadcastShardOperationFailedException> accumulatedExceptions, Task task). The default body would be noop. For cache clear specifically we would only need request but the shard results + task might be useful for other implementers.

One thing to consider is exception handling. I'm inclined to have exceptions in the hook result in the failure of the node-level action, and implementers can swallow exceptions themselves in nodeOperation() if they don't want that to happen. It seems more flexible than the other way around, and the body of nodeOperation() may be critical enough to the whole action that we would want to surface exceptions there.

Related component

Other

Describe alternatives you've considered

For the specific cache clear problem, another solution is to just remove the force clear from the action altogether, and wait for the next scheduled cleanup to pick up all the marked keys. This would take ~1min. I think this is an acceptable solution too, but adding a node-level hook would allow us to both make cache clear more efficient and not have to wait for clearing to begin after the API is hit. The same logic applies for the field data cache which is currently being reworked.

Additional context

No response

Metadata

Metadata

Assignees

Labels

OtherenhancementEnhancement or improvement to existing feature or requestuntriaged

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions