-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Track shardStarted events for simulation in DesiredBalanceComputer #133630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hi @ywangd, I've created a changelog YAML for you. |
I had some back-and-forth with the way to track shardStarted events. At the end, I decided to do it with mostly |
// Check whether the shard has actually started on the target node | ||
final var startedShard = routingNodes.assignedShards(shardForSimulation.shardId()) | ||
.stream() | ||
.filter( | ||
shard -> shard.started() | ||
&& shard.primary() == shardForSimulation.primary() | ||
&& shard.currentNodeId().equals(shardForSimulation.currentNodeId()) | ||
) | ||
.findFirst() | ||
.orElse(null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might feel a bit hacky since normally we should check allocationId
to be certain. But simulation and actual shard start event have different allocationId
and cannot be compared. Since this check is in a tight loop of balance computation and tracking is reset every ClusterInfo polling, it seems sufficient to rely on other properties, e.g. no two copies of the same shard can be allocated on the same node. There could be some fuzziness here in edge cases. But I think they are not really concerning atm. Happy to take advice.
// A new ClusterInfo has arrived, clear the tracking for started shards | ||
if (lastClusterInfo != desiredBalanceInput.routingAllocation().clusterInfo()) { | ||
lastClusterInfo = desiredBalanceInput.routingAllocation().clusterInfo(); | ||
shardsStartedByAllocate = new HashMap<>(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one issue with this approach is that a shard allocated in last polling of ClusterInfo but started in this polling cycle is not accounted for, i.e. the following sequence of events:
- 1st CluserInfo poll
- Allocator moves a shard, but it has not started on the target node
- 2nd ClusterInfo poll
- The shard starts on the target node
I think we want the started shard contributing to the simulations performed in the 2nd ClusterInfo polling cycle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would that not be simulated by the shard started done in the beginning of compute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind the approach but I think it adds some complexity and state to an already quite complex/stateful bit of code
shardsStartedByAllocate = new HashMap<>(); | ||
} | ||
|
||
final var alreadySimulatedStartedShards = new HashSet<ShardForSimulation>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we ask the simulator for the "alreadySimulatedShards" instead of tracking them alongside the simulateShardStarted calls? or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean tracking it inside ClusterInfoSimulator
? Yes that's possible. It does not do that currently but we can make it do so.
I think it will have to add some complexity. But if we track the real shard started events, the complexity might be a bit less in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few initial comments, did not get into the weeds of the started simulations yet
return currentClusterInfo; | ||
} | ||
|
||
private void updateAndGetCurrentClusterInfo() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method name here hints that it should return the cluster info? That would seem nice to do, but I'd also be fine to just call it updateClusterInfo
// A new ClusterInfo has arrived, clear the tracking for started shards | ||
if (lastClusterInfo != desiredBalanceInput.routingAllocation().clusterInfo()) { | ||
lastClusterInfo = desiredBalanceInput.routingAllocation().clusterInfo(); | ||
shardsStartedByAllocate = new HashMap<>(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would that not be simulated by the shard started done in the beginning of compute?
@@ -77,6 +80,8 @@ public class DesiredBalanceComputer { | |||
private long lastConvergedTimeMillis; | |||
private long lastNotConvergedLogMessageTimeMillis; | |||
private Level convergenceLogMsgLevel; | |||
private ClusterInfo lastClusterInfo; | |||
private Map<ShardForSimulation, ShardRouting> shardsStartedByAllocate; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we see this as input instead? I think I prefer to track this outside this class rather than being this stateful here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could even reside on or be accessible through the ClusterInfo
object?
I got some new idea after talking to Henning. I'll rework this PR. Please hold on your reviews. Thanks! 😅 |
If a shard starts on the target node before the next ClusterInfo polling, today we don't include it for the simulation. With this PR, we track shards that can potentially start within one ClusterInfo polling cycle so that they are always included in simulation. The tracking is reset when a new ClusterInfo arrives.
Resolves: ES-12723