Track shardStarted events for simulation in DesiredBalanceComputer #133630

ywangd · 2025-08-27T08:36:21Z

If a shard starts on the target node before the next ClusterInfo polling, today we don't include it for the simulation. With this PR, we track shards that can potentially start within one ClusterInfo polling cycle so that they are always included in simulation. The tracking is reset when a new ClusterInfo arrives.

Resolves: ES-12723

Relates: ES-12723

elasticsearchmachine · 2025-08-27T08:36:46Z

Hi @ywangd, I've created a changelog YAML for you.

ywangd · 2025-08-27T08:45:35Z

I had some back-and-forth with the way to track shardStarted events. At the end, I decided to do it with mostly DesiredBalanceComputer since (1) it is the only place where it is needed and (2) less wiring changes compared to tracking inside InternalClusterInfoService. I am raising it as a draft to seek agreement on the approach. I will work on more tests if we are OK to proceed or I can take a different approach if folks are not happy with the current one. Thanks!

ywangd · 2025-08-27T08:50:55Z

...main/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceComputer.java

+            // Check whether the shard has actually started on the target node
+            final var startedShard = routingNodes.assignedShards(shardForSimulation.shardId())
+                .stream()
+                .filter(
+                    shard -> shard.started()
+                        && shard.primary() == shardForSimulation.primary()
+                        && shard.currentNodeId().equals(shardForSimulation.currentNodeId())
+                )
+                .findFirst()
+                .orElse(null);


This might feel a bit hacky since normally we should check allocationId to be certain. But simulation and actual shard start event have different allocationId and cannot be compared. Since this check is in a tight loop of balance computation and tracking is reset every ClusterInfo polling, it seems sufficient to rely on other properties, e.g. no two copies of the same shard can be allocated on the same node. There could be some fuzziness here in edge cases. But I think they are not really concerning atm. Happy to take advice.

ywangd · 2025-08-27T11:28:33Z

...main/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceComputer.java

+        // A new ClusterInfo has arrived, clear the tracking for started shards
+        if (lastClusterInfo != desiredBalanceInput.routingAllocation().clusterInfo()) {
+            lastClusterInfo = desiredBalanceInput.routingAllocation().clusterInfo();
+            shardsStartedByAllocate = new HashMap<>();
+        }


I think one issue with this approach is that a shard allocated in last polling of ClusterInfo but started in this polling cycle is not accounted for, i.e. the following sequence of events:

1st CluserInfo poll

Allocator moves a shard, but it has not started on the target node

2nd ClusterInfo poll

The shard starts on the target node

I think we want the started shard contributing to the simulations performed in the 2nd ClusterInfo polling cycle?

Would that not be simulated by the shard started done in the beginning of compute?

nicktindall

I don't mind the approach but I think it adds some complexity and state to an already quite complex/stateful bit of code

nicktindall · 2025-08-28T03:13:48Z

...main/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceComputer.java

+            shardsStartedByAllocate = new HashMap<>();
+        }
+
+        final var alreadySimulatedStartedShards = new HashSet<ShardForSimulation>();


Could we ask the simulator for the "alreadySimulatedShards" instead of tracking them alongside the simulateShardStarted calls? or am I missing something?

Do you mean tracking it inside ClusterInfoSimulator? Yes that's possible. It does not do that currently but we can make it do so.

ywangd · 2025-08-28T04:23:54Z

it adds some complexity and state to an already quite complex/stateful bit of code

I think it will have to add some complexity. But if we track the real shard started events, the complexity might be a bit less in DesiredBalanceComputer. I am thinking switching to that also because of this comment

henningandersen

Left a few initial comments, did not get into the weeds of the started simulations yet

henningandersen · 2025-08-28T08:04:05Z

server/src/main/java/org/elasticsearch/cluster/InternalClusterInfoService.java

+        return currentClusterInfo;
+    }
+
+    private void updateAndGetCurrentClusterInfo() {


The method name here hints that it should return the cluster info? That would seem nice to do, but I'd also be fine to just call it updateClusterInfo

henningandersen · 2025-08-28T08:05:47Z

...main/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceComputer.java

+        // A new ClusterInfo has arrived, clear the tracking for started shards
+        if (lastClusterInfo != desiredBalanceInput.routingAllocation().clusterInfo()) {
+            lastClusterInfo = desiredBalanceInput.routingAllocation().clusterInfo();
+            shardsStartedByAllocate = new HashMap<>();
+        }


Would that not be simulated by the shard started done in the beginning of compute?

henningandersen · 2025-08-28T08:09:05Z

...main/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceComputer.java

@@ -77,6 +80,8 @@ public class DesiredBalanceComputer {
    private long lastConvergedTimeMillis;
    private long lastNotConvergedLogMessageTimeMillis;
    private Level convergenceLogMsgLevel;
+    private ClusterInfo lastClusterInfo;
+    private Map<ShardForSimulation, ShardRouting> shardsStartedByAllocate;


Can we see this as input instead? I think I prefer to track this outside this class rather than being this stateful here.

It could even reside on or be accessible through the ClusterInfo object?

ywangd · 2025-08-28T09:04:34Z

I got some new idea after talking to Henning. I'll rework this PR. Please hold on your reviews. Thanks! 😅

ywangd added 5 commits August 26, 2025 19:57

[Test] Test to verify ClusterInfoSimulator update for each compute cycle

15a3ee4

Relates: ES-12723

enhance the test to include actual shard started event

bf0629e

Merge remote-tracking branch 'origin/main' into ES-12723-test

5a14682

Track startedShards in compute

d8a84d3

improve handling for resetting desired balance

c2d1b36

ywangd requested review from nicktindall, DiannaHohensee, mhl-b and henningandersen August 27, 2025 08:36

ywangd added >enhancement :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v9.2.0 labels Aug 27, 2025

Update docs/changelog/133630.yaml

4293f15

ywangd commented Aug 27, 2025

View reviewed changes

more comment

c3eacda

ywangd commented Aug 27, 2025

View reviewed changes

nicktindall reviewed Aug 28, 2025

View reviewed changes

henningandersen reviewed Aug 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Track shardStarted events for simulation in DesiredBalanceComputer #133630

Track shardStarted events for simulation in DesiredBalanceComputer #133630

Uh oh!

ywangd commented Aug 27, 2025

Uh oh!

elasticsearchmachine commented Aug 27, 2025

Uh oh!

ywangd commented Aug 27, 2025

Uh oh!

ywangd Aug 27, 2025

Uh oh!

ywangd Aug 27, 2025

Uh oh!

henningandersen Aug 28, 2025

Uh oh!

nicktindall left a comment

Uh oh!

nicktindall Aug 28, 2025

Uh oh!

ywangd Aug 28, 2025

Uh oh!

ywangd commented Aug 28, 2025

Uh oh!

henningandersen left a comment

Uh oh!

henningandersen Aug 28, 2025

Uh oh!

henningandersen Aug 28, 2025

Uh oh!

henningandersen Aug 28, 2025

Uh oh!

henningandersen Aug 28, 2025

Uh oh!

ywangd commented Aug 28, 2025

Uh oh!

Uh oh!

Track shardStarted events for simulation in DesiredBalanceComputer #133630

Are you sure you want to change the base?

Track shardStarted events for simulation in DesiredBalanceComputer #133630

Uh oh!

Conversation

ywangd commented Aug 27, 2025

Uh oh!

elasticsearchmachine commented Aug 27, 2025

Uh oh!

ywangd commented Aug 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nicktindall left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ywangd commented Aug 28, 2025

Uh oh!

henningandersen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ywangd commented Aug 28, 2025

Uh oh!

Uh oh!