ISR-Based Custom Goal #2273
-
|
TL;DR: Does Cruise Control support goals that use information about the in-sync replica sets to determine whether they are fulfilled? How might one implement such a goal? Let me preface this by saying that yes, I am aware that the intended setup for ensuring high availability and minimum risk of data loss is to have three separate datacenters. This discussion is about Cruise Control's capabilities for a two-datacenter setup. Suppose you have a topic with 3 replicas and Under the described configuration, the topic continues to be usable if one replica falls behind, even if this is the single separate replica. If the two replicas in the same rack are in-sync while the third replica, which is on the other rack, is out of sync, then the newest messages for that topic are all reliant on the same rack and this rack becomes a single point of failure. If it becomes unavailable, we are forced to choose between service availability and data integrity, because the data on the other rack is out of sync and we might not be able to immediately recover the other rack2. In other words, either you keep the topic available but lose the newest messages, or you wait until the data can be recovered from the rack that failed. There will always be an element of risk when only two datacenters are involved, but one could mitigate this by implementing a custom goal for Cruise Control with the following heuristics:
Simply put, this goal would ensure that in-sync replicas are spread out among different racks as much as possible. Even if a critical mass of brokers goes offline, there is always an in-sync copy of the data on a machine in a different rack from which the replica set can be reconstructed. However, in order to implement such a goal, Cruise Control would have to provide access to the ISR state to the goal. Based on a cursory reading of the source code, this does not appear to be the case to me. The goals are given access to the data in Am I missing something? Or is it possible to implement such a goal without any refactoring in Cruise Control itself? Or perhaps there is a completely different way to view this problem that I'm not considering. Footnotes
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 7 replies
-
Kafka was not designed to operate as a single cluster across multiple data centers. It should be deployed within a single data center to ensure low latency and reliability. If cross-data center redundancy is required, the data should be replicated to a separate Kafka cluster running in a separate data center using tools like MirrorMaker 2.
Using the
You're right, Cruise Control goals currently don't have access to ISR (in-sync replica) information. To implement such a goal, the ClusterModel would need to be extended to include ISR information. Once available, a custom goal could be created to help make rack-aware partition movement decisions based on the actual in-sync replica information during optimization proposal generation process. |
Beta Was this translation helpful? Give feedback.
-
From experience, I do think this setup is pretty common, though.
I imagine the So I see this item as two separate AIs:
|
Beta Was this translation helpful? Give feedback.
From experience, I do think this setup is pretty common, though.
I imagine the
RackAwareGoalcould have this additional check while distributing replicas, given that the cluster model contains the ISR info. Although I do agree it would be cleaner to just have a separate goal for handling this case. But that would then bring up the question of what the priority of this goal should be. Right after theRackAwareGoal?So I see this item as two separate AIs: