fix: stop swallowing the client-certificate error#374
Conversation
When an EtcdCluster requests TLS, fetchAndValidateState provisions a client certificate for the operator via createClientCertificate. The error from that call was logged and then ignored, so reconciliation continued even though the credential the data path depends on was never created. Return a requeue with backoff (the existing requeueDuration, matching the sibling error paths in this function) on failure so the provisioning is retried instead of silently swallowed. Adds a unit test driving fetchAndValidateState with a fake client whose scheme omits the cert-manager types, forcing the certificate lookup to fail, and asserts the result is a non-zero RequeueAfter and that no reconcileState is returned (reconcile does not proceed past cert provisioning). Refs etcd-io#370 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Lange <xrlange@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: xrl The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @xrl. Thanks for your PR. I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
When an
EtcdClusterrequests TLS,fetchAndValidateStateprovisions the operator's client certificate viacreateClientCertificate. That error was logged and then ignored, so reconciliation proceeded even though the credential the TLS data path depends on was never created — the failure is invisible and surfaces later as a more confusing downstream error.This returns a requeue with backoff (the existing
requeueDuration, matching the sibling error paths in the same function) on failure, so provisioning is retried instead of silently swallowed.Adds a unit test that drives
fetchAndValidateStatewith a fake client whose scheme omits the cert-manager types (forcing the certificate lookup to fail) and asserts a non-zeroRequeueAfterwith noreconcileStatereturned (reconcile does not proceed past cert provisioning).Refs #370
Related work — etcd TLS & operability
Independent peer/client TLS reshape and surrounding operability work, in dependency / stacking order (
→marks this PR):spec.tls.{peer,client}surfaces; breaking alpha API change (no conversion webhook, by design)TLSReadycondition + TLS lifecycle EventsPeerCANotSharedThe TLS reshape (#376) supersedes the earlier conflated T2/T3/T4 plan (per-surface mounts, flags+scheme, and client
*tls.Confignow all live in #376). T5←#376 and T6←#377 are stacked: review/merge in order. T0, the reconcile/QoS knobs, and the stress harness are independent.