Skip to content

Partially-installed Releases are never retried #416

@nojnhuh

Description

@nojnhuh

What steps did you take and what happened:

I haven't found a reliable way to reproduce this yet, but I observed CAAPH hit a transient failure when installing a chart:

E0729 01:35:05.341942       1 controller.go:324] "Reconciler error" err="Kubernetes cluster unreachable: Get \"https://capz-0bei2n-6bb3419a.canadacentral.cloudapp.azure.com:6443/version?timeout=10s\": net/http: request canceled (Client.Timeout exceeded while awaiting headers)" controller="helmreleaseproxy" controllerGroup="addons.cluster.x-k8s.io" controllerKind="HelmReleaseProxy" HelmReleaseProxy="default/azuredisk-csi-driver-capz-0bei2n-c4phf" namespace="default" name="azuredisk-csi-driver-capz-0bei2n-c4phf" reconcileID="f99b16ad-79e0-41d4-a02a-d580336d4fe8"

This left the Helm Release in a pending-install state. Then in the next and all future reconciliations of the HelmReleaseProxy, CAAPH identified the release as "up to date" and did not reconcile it further, so the resources associated with the Release never finished installing:

I0729 01:35:15.083876       1 helm_client.go:370] "Release `azuredisk-csi-driver-oot` is up to date, no upgrade required, revision = 1" controller="helmreleaseproxy" controllerGroup="addons.cluster.x-k8s.io" controllerKind="HelmReleaseProxy" HelmReleaseProxy="default/azuredisk-csi-driver-capz-0bei2n-c4phf" namespace="default" name="azuredisk-csi-driver-capz-0bei2n-c4phf" reconcileID="ee9e4ebb-243f-421c-b0d7-08a77c30f577"

What did you expect to happen:

CAAPH tries to continue the install.

Anything else you would like to add:

The full CAAPH logs come from this CAPZ load test run.

Environment:

  • Cluster API version: v1.10.4
  • Cluster API Add-on Provider for Helm version: v0.2.5
  • minikube/kind version:
  • Kubernetes version: (use kubectl version): v1.34.0-beta.0.567+dd4e4f1dd13f68
  • OS (e.g. from /etc/os-release):

/kind bug

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions