Replies: 1 comment 2 replies
-
|
In case of network partitioning or a complete failure of the data center, the remaining etcd members should still have quorum in order to continue operations. As a quorum is a majority of the members, I don't see how this can be achieved so that failover can happen always. E.g. with 3 members, 2+1 split, if the DC with 2 members fails, remaining single node won't be operational. As for your other part of the question, there's always a loadbalancer of some sorts which points to the Kubernetes controlplane, and you can configure it the way you need to point traffic to the available DC. You could split your controlplane nodes across 3 DCs/AZs, and in that case a failure of a single DC still allows other two DCs two operate. Keep in mind though that loss of network connectivity from etcd point of view is equivalent to a failure, so at any moment two DCs should always be connected. Also etcd is sensitive to network latency, so usually these failure domains are availability zones. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I have a scenario where I want to deploy a Talos cluster spanning two different physical data centers for increased availability. The problem with this and Kubernetes is obviously the need to quorum for etcd.
So, what I'm wondering is if there is any good way to "manually failover" the control plane in a Talos linux cluster in the event that the primary data center goes down? Would simply running
talosctl etcd remove-memberto remove the other nodes work?Beta Was this translation helpful? Give feedback.
All reactions