ISR-Based Custom Goal #2273

Arc676 · 2025-05-12T13:05:10Z

Arc676
May 12, 2025

TL;DR: Does Cruise Control support goals that use information about the in-sync replica sets to determine whether they are fulfilled? How might one implement such a goal?

Let me preface this by saying that yes, I am aware that the intended setup for ensuring high availability and minimum risk of data loss is to have three separate datacenters. This discussion is about Cruise Control's capabilities for a two-datacenter setup.

Suppose you have a topic with 3 replicas and min.insync.replicas = 2 distributed among machines from two racks¹ that are isolated from each other and therefore safe to fail without impacting the other. Kafka's rack-awareness feature can be used to ensure that the replicas are not all allocated to machines from the same rack, ensuring that one replica is isolated from the others. However, this constraint only applies during initial topic distribution and not while the topic is in use.

Under the described configuration, the topic continues to be usable if one replica falls behind, even if this is the single separate replica. If the two replicas in the same rack are in-sync while the third replica, which is on the other rack, is out of sync, then the newest messages for that topic are all reliant on the same rack and this rack becomes a single point of failure. If it becomes unavailable, we are forced to choose between service availability and data integrity, because the data on the other rack is out of sync and we might not be able to immediately recover the other rack². In other words, either you keep the topic available but lose the newest messages, or you wait until the data can be recovered from the rack that failed.

There will always be an element of risk when only two datacenters are involved, but one could mitigate this by implementing a custom goal for Cruise Control with the following heuristics:

If the number of in-sync replicas equals the replica count, then the goal is fulfilled
Otherwise, determine the racks on which the in-sync replicas are located
- If the in-sync replicas are on separate racks, then the goal is fulfilled
- Otherwise, the in-sync replicas are all on the same rack. Propose moving one of them to a broker on another rack.

Simply put, this goal would ensure that in-sync replicas are spread out among different racks as much as possible. Even if a critical mass of brokers goes offline, there is always an in-sync copy of the data on a machine in a different rack from which the replica set can be reconstructed.

However, in order to implement such a goal, Cruise Control would have to provide access to the ISR state to the goal. Based on a cursory reading of the source code, this does not appear to be the case to me. The goals are given access to the data in ClusterModel, but this does not include the state of the ISR sets. The last time this data is available appears to be a couple levels up, when the optimization options are being computed while there is access to an instance of KafkaCruiseControl, which has a LoadMonitor through which one could access the cluster state.

Am I missing something? Or is it possible to implement such a goal without any refactoring in Cruise Control itself? Or perhaps there is a completely different way to view this problem that I'm not considering.

We'll use "racks" to generically refer to isolated machines used as Kafka brokers. This is to reflect Kafka's use of the term for its rack awareness feature. Of course, this is an abstraction and we could instead identify an entire datacenter as a single rack. What matters is that the "racks" are not connected to the same points of failure. ↩
In the context of a two-datacenter setup, this is particularly relevant for disaster scenarios where an entire datacenter is affected. If the in-sync replicas are all in the same datacenter and there is a problem, the other datacenter can't be used to bring the topic back online without losing data. ↩

Answered by danielgospodinow

Jul 17, 2025

Kafka was not designed to operate as a single cluster across multiple data centers.

From experience, I do think this setup is pretty common, though.

To achieve that, a new goal would be required.

I imagine the RackAwareGoal could have this additional check while distributing replicas, given that the cluster model contains the ISR info. Although I do agree it would be cleaner to just have a separate goal for handling this case. But that would then bring up the question of what the priority of this goal should be. Right after the RackAwareGoal?

So I see this item as two separate AIs:

Enhance CC's cluster model to include ISR info (which, by the way, can considerably increase CC's memor…

View full answer

kyguy · 2025-07-16T18:05:47Z

kyguy
Jul 16, 2025

Let me preface this by saying that yes, I am aware that the intended setup for ensuring high availability and minimum risk of data loss is to have three separate datacenters. This discussion is about Cruise Control's capabilities for a two-datacenter setup.

Kafka was not designed to operate as a single cluster across multiple data centers. It should be deployed within a single data center to ensure low latency and reliability. If cross-data center redundancy is required, the data should be replicated to a separate Kafka cluster running in a separate data center using tools like MirrorMaker 2.

Simply put, this goal would ensure that in-sync replicas are spread out among different racks as much as possible. Even if a critical mass of brokers goes offline, there is always an in-sync copy of the data on a machine in a different rack from which the replica set can be reconstructed.

Using the RackAware and LeaderReplicaDistribution goals help ensure general rack diversity and better leader placement. A custom "LeaderRackDistribution" goal could improve this further, but even that wouldn’t guarantee that the in-sync replicas are evenly distributed across racks. To achieve that, a new goal would be required.

However, in order to implement such a goal, Cruise Control would have to provide access to the ISR state to the goal. Based on a cursory reading of the source code, this does not appear to be the case to me. The goals are given access to the data in ClusterModel, but this does not include the state of the ISR sets.

You're right, Cruise Control goals currently don't have access to ISR (in-sync replica) information. To implement such a goal, the ClusterModel would need to be extended to include ISR information. Once available, a custom goal could be created to help make rack-aware partition movement decisions based on the actual in-sync replica information during optimization proposal generation process.

2 replies

Arc676 Jul 17, 2025
Author

Thanks for the detailed response. I'll look into the LeaderReplicaDistribution goal.

Kafka was not designed to operate as a single cluster across multiple data centers.

My terminology might be incorrect here. My understanding of the conventional wisdom is that preventing data loss requires at least 3 copies, since this is the minimum number of replicas with which you can tolerate one failure and still have multiple copies (in general, $2n + 1$ copies to tolerate $n$ failures while still retaining a majority of the replicas). This, of course, isn't tied to data centers, but if you add in disaster recovery then this implies that the three copies should be in three different locations and therefore separate DCs. But I do understand your point; mirroring the clusters is something to be taken into consideration.

kyguy Jul 17, 2025

My understanding of the conventional wisdom is that preventing data loss requires at least 3 copies, since this is the minimum number of replicas with which you can tolerate one failure and still have multiple copies (in general, 2 n + 1 copies to tolerate n failures while still retaining a majority of the replicas).

Your terminology makes sense, and your reasoning is sound. A replication factor of 3 is a common best practice in Kafka to protect against single-node or single-replica failure.

This, of course, isn't tied to data centers, but if you add in disaster recovery then this implies that the three copies should be in three different locations and therefore separate DCs. But I do understand your point; mirroring the clusters is something to be taken into consideration.

You're absolutely right, keeping all three replicas in the same data center leaves you exposed to site-level failures. It's just that while it's technically possible to run a single Kafka cluster across multiple DCs, it introduces significant complexity around latency and cluster stability. Unless you have strong guarantees about the network between the data centers, a more robust and maintainable approach would be to run separate Kafka clusters in each data center and mirror between them using a tool like MirrorMaker 2.

danielgospodinow · 2025-07-17T07:44:48Z

danielgospodinow
Jul 17, 2025

Kafka was not designed to operate as a single cluster across multiple data centers.

From experience, I do think this setup is pretty common, though.

To achieve that, a new goal would be required.

I imagine the RackAwareGoal could have this additional check while distributing replicas, given that the cluster model contains the ISR info. Although I do agree it would be cleaner to just have a separate goal for handling this case. But that would then bring up the question of what the priority of this goal should be. Right after the RackAwareGoal?

So I see this item as two separate AIs:

Enhance CC's cluster model to include ISR info (which, by the way, can considerably increase CC's memory footprint in larger clusters - this should be carefully examined)
Enhance CC's rack awareness goal to take ISR info into consideration while rebalancing partitions, OR create a new hard goal that ensures this, and maybe place it right after the RackAwareGoal

5 replies

Arc676 Jul 17, 2025
Author

Thanks for replying. I had not considered the memory footprint problem; if the cluster model is extended this way, it would probably make more sense for the feature to be opt-in. As described in the problem statement, the motivation behind this feature request of sorts is to mitigate data loss risks in a two-location setup. For a large cluster distributed across three locations (or more), the ISR-based goals become less beneficial, although they could certainly be useful even at the rack-level in a single data center.

Nothing major comes to mind regarding the two options for the second item, but as an extension of my previous point, it might be cleaner to implement this feature as a new goal. If the inclusion of ISR data is opt-in, then the logic regarding the ISR goal would also be opt-in. It might then make more sense to isolate this code from the general RackAwareGoal and simply disable or disallow the new goal in a cluster for which ISR data is not included in the cluster model.

kyguy Jul 17, 2025

But that would then bring up the question of what the priority of this goal should be. Right after the RackAwareGoal?

Sounds reasonable to me.

So I see this item as two separate AIs:

Enhance CC's cluster model to include ISR info (which, by the way, can considerably increase CC's memory footprint in larger clusters - this should be carefully examined)

Enhance CC's rack awareness goal to take ISR info into consideration while rebalancing partitions, OR create a new hard goal that ensures this, and maybe place it right after the RackAwareGoal

+1

Arc676 Aug 5, 2025
Author

If it seems like the appropriate continuation of this feature request for both of you, I'll mark this discussion as resolved and open new issues for the two aforementioned items. Or is there further internal discussion after which the issues will be opened based on whichever approach is decided on?

kyguy Aug 5, 2025

I think opening an issue as a feature request with the reasoning and potential paths forward discussed for this would be fine.

Arc676 Aug 6, 2025
Author

Discussion continued in #2300

ISR-Based Custom Goal #2273

Uh oh!

Arc676 May 12, 2025

Footnotes

Replies: 2 comments · 7 replies

Uh oh!

Uh oh!

kyguy Jul 16, 2025

Uh oh!

Arc676 Jul 17, 2025 Author

Uh oh!

kyguy Jul 17, 2025

Uh oh!

danielgospodinow Jul 17, 2025

Uh oh!

Arc676 Jul 17, 2025 Author

Uh oh!

kyguy Jul 17, 2025

Uh oh!

Arc676 Aug 5, 2025 Author

Uh oh!

kyguy Aug 5, 2025

Uh oh!

Arc676 Aug 6, 2025 Author

Arc676
May 12, 2025

Replies: 2 comments 7 replies

kyguy
Jul 16, 2025

Arc676 Jul 17, 2025
Author

danielgospodinow
Jul 17, 2025

Arc676 Jul 17, 2025
Author

Arc676 Aug 5, 2025
Author

Arc676 Aug 6, 2025
Author