Skip to content

Replace Statefulsets with customized controller #362

Description

@ahrtr

Problem

Currently etcd-operator uses StatefulSets to manage the etcd member/PODs. In happy cases, it's working well so far. But when it comes to repair a failed member/POD, it's hard to do it with StatefulSet.

Statefulsets assigns an ordinal Index for each POD, and always scales down from the highest ordinal. If member 1 is corrupt but member 0 and 2 are healthy, Statefulsets forces you to remove member-2 first.

Also currently the controller manages Statefulset.spec.replicas, the StatefulSet only manages "how many pods to run". It knows nothing about the etcd membership. This introduces a reconciliation gap: we must constantly reconcile STS-replica-count vs etcd-member-count.

Proposal

The solution is to introduce a customzed controller to manage the etcd members/PODs. We discussed multiple solutions multiple times in our community meetings.

  • we manage POD directly, each etcd member maps 1-to-1 to a Pod owned directly by EtcdCluster.
  • we manage deployment or statefulset directly, and each etcd member maps to 1-to-1 to a Statefulset or deployment owned by EtcdCluster. For each statefulset or deployment, the replica is 1.

We believe the first solution is the most flexible. Ideally, we should deliver a PoC for each solution and decide to use which one later. But we might not have that bandwidth. Let's start from solution 1, and deliver a PoC for it first. If it's confirmed that it doesn't need huge effort and easy to understand & maintain. We might proceed with it.

Note we haven't released v1.0 yet, so we're happy to discard the existing design and implementation and start over in future if necessary.

Requirements

We don't need to deliver a complete & perfect solution for now. Instead, let's do it step by step. The first step is just to replace the StatefulSet with a customized controller as mentioned in solution 1 above. We don't add new feature in the first step, we just keep the existing functionalities:

  • Users are able to create a brand new etcd cluster of any size
  • Users are able scale in & out an existing etcd cluster
  • Users are able to configure the certificate providers: auto, cert-manager or disabled
  • Users are able to configure a storageClass to integrate with an existing CSI driver
  • Users are able to customze the etcd options

cc @hakman @ivanvc @jberkus @nwnt

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions