diff --git a/docs/book/src/SUMMARY.md b/docs/book/src/SUMMARY.md index 1852c6e08225..29c53fe7f99e 100644 --- a/docs/book/src/SUMMARY.md +++ b/docs/book/src/SUMMARY.md @@ -39,6 +39,7 @@ - [Implementing Runtime Extensions](./tasks/experimental-features/runtime-sdk/implement-extensions.md) - [Implementing Lifecycle Hook Extensions](./tasks/experimental-features/runtime-sdk/implement-lifecycle-hooks.md) - [Implementing Topology Mutation Hook Extensions](./tasks/experimental-features/runtime-sdk/implement-topology-mutation-hook.md) + - [Implementing Upgrade Plan Runtime Extensions](./tasks/experimental-features/runtime-sdk/implement-upgrade-plan-hooks.md) - [Deploying Runtime Extensions](./tasks/experimental-features/runtime-sdk/deploy-runtime-extension.md) - [Ignition Bootstrap configuration](./tasks/experimental-features/ignition.md) - [Running multiple providers](./tasks/multiple-providers.md) diff --git a/docs/book/src/images/runtime-sdk-lifecycle-hooks.png b/docs/book/src/images/runtime-sdk-lifecycle-hooks.png index 7153ee288aef..2cefd8508e85 100644 Binary files a/docs/book/src/images/runtime-sdk-lifecycle-hooks.png and b/docs/book/src/images/runtime-sdk-lifecycle-hooks.png differ diff --git a/docs/book/src/reference/glossary.md b/docs/book/src/reference/glossary.md index 506d466d948b..e24788794fb5 100644 --- a/docs/book/src/reference/glossary.md +++ b/docs/book/src/reference/glossary.md @@ -1,6 +1,6 @@ # Table of Contents -[A](#a) | [B](#b) | [C](#c) | [D](#d) | [E](#e) | [H](#h) | [I](#i) | [K](#k) | [L](#l)| [M](#m) | [N](#n) | [O](#o) | [P](#p) | [R](#r) | [S](#s) | [T](#t) | [W](#w) +[A](#a) | [B](#b) | [C](#c) | [D](#d) | [E](#e) | [H](#h) | [I](#i) | [K](#k) | [L](#l)| [M](#m) | [N](#n) | [O](#o) | [P](#p) | [R](#r) | [S](#s) | [T](#t) | [U](#u) |[W](#w) # A --- @@ -132,6 +132,15 @@ Cluster API IPAM Provider Metal3 ### CAREX Cluster API Runtime Extensions Provider Nutanix +### Chained upgrade +An upgrade sequence that goes from one Kubernetes version to another by passing through a set of intermediate versions. +E.g. upgrading from v1.31.0 (current state) to v1.34.0 (target version) requires +a chained upgrade with the following steps: v1.32.0 (first intermediate version) -> v1.33.0 (second intermediate version) -> v1.34.0 (target version). + +The sequence of versions in a chained upgrade is also called [upgrade plan](#upgrade-plan). + +See also [efficient upgrade](#efficient-upgrade). + ### Cloud provider Or __Cloud service provider__ @@ -219,6 +228,14 @@ A feature implementation offered as part of the Cluster API project and maintain # E --- +### Efficient upgrade + +A [chained upgrade](#chained-upgrade) where worker nodes skip some of the intermediate versions, +when allowed by the [Kubernetes version skew policy](https://kubernetes.io/releases/version-skew-policy/). + +When the chained upgrade is also an efficient upgrade, the [upgrade plan](#upgrade-plan) for worker machines is a subset +of the [upgrade plan](#upgrade-plan) for control plane machines. + ### External patch [Patch](#patch) generated by an external component using [Runtime SDK](#runtime-sdk). Alternative to [inline patch](#inline-patch). @@ -460,6 +477,16 @@ A [Runtime Hook](#runtime-hook) that allows external components to generate [pat See [Topology Mutation](../tasks/experimental-features/runtime-sdk/implement-topology-mutation-hook.md) +# U +--- + +### Upgrade plan +The sequence of intermediate versions ... target version that a Cluster must upgrade to when +performing a [chained upgrade](#chained-upgrade). + +Notably, the upgrade plan for control plane machines might be a superset of the upgrade plan for +workers machines. + # W --- diff --git a/docs/book/src/tasks/experimental-features/runtime-sdk/implement-lifecycle-hooks.md b/docs/book/src/tasks/experimental-features/runtime-sdk/implement-lifecycle-hooks.md index d23e3f9360ea..1e30e1443414 100644 --- a/docs/book/src/tasks/experimental-features/runtime-sdk/implement-lifecycle-hooks.md +++ b/docs/book/src/tasks/experimental-features/runtime-sdk/implement-lifecycle-hooks.md @@ -14,8 +14,24 @@ The lifecycle hooks allow hooking into the Cluster lifecycle. The following diag ![Lifecycle Hooks overview](../../../images/runtime-sdk-lifecycle-hooks.png) -Please see the corresponding [CAEP](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220414-runtime-hooks.md) -for additional background information. +Please see the corresponding [CAEP](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220414-runtime-hooks.md) as well as the proposal for [Chained and efficient upgrades](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/20250513-chained-and-efficient-upgrades-for-clusters-with-managed-topologies.md) +for additional background information. + + +* [Implementing Lifecycle Hook Runtime Extensions](#implementing-lifecycle-hook-runtime-extensions) + * [Introduction](#introduction) + * [Guidelines](#guidelines) + * [Definitions](#definitions) + * [BeforeClusterCreate](#beforeclustercreate) + * [AfterControlPlaneInitialized](#aftercontrolplaneinitialized) + * [BeforeClusterUpgrade](#beforeclusterupgrade) + * [BeforeControlPlaneUpgrade](#beforecontrolplaneupgrade) + * [AfterControlPlaneUpgrade](#aftercontrolplaneupgrade) + * [BeforeWorkersUpgrade](#beforeworkersupgrade) + * [AfterWorkersUpgrade](#afterworkersupgrade) + * [AfterClusterUpgrade](#afterclusterupgrade) + * [BeforeClusterDelete](#beforeclusterdelete) + ## Guidelines @@ -44,7 +60,7 @@ This hook is called after the Cluster object has been created by the user, immed are part of a Cluster topology(*) are going to be created. Runtime Extension implementers can use this hook to determine/prepare add-ons for the Cluster and block the creation of those objects until everything is ready. -#### Example Request: +Example Request: ```yaml apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 @@ -62,7 +78,7 @@ cluster: ... ``` -#### Example Response: +Example Response: ```yaml apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 @@ -83,7 +99,7 @@ This usually happens sometime during the first CP machine provisioning or immedi Runtime Extension implementers can use this hook to execute tasks, for example component installation on workload clusters, that are only possible once the Control Plane is available. This hook does not block any further changes to the Cluster. -#### Example Request: +Example Request: ```yaml apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 @@ -101,7 +117,7 @@ cluster: ... ``` -#### Example Response: +Example Response: ```yaml apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 @@ -116,18 +132,75 @@ This hook is called after the Cluster object has been updated with a new `spec.t immediately before the new version is going to be propagated to the control plane (*). Runtime Extension implementers can use this hook to execute pre-upgrade add-on tasks and block upgrades of the ControlPlane and Workers. +(*) Under normal circumstances `spec.topology.version` gets propagated to the control plane immediately; however +if previous upgrades or worker machine rollouts are still in progress, the system waits for those operations +to complete before starting the new upgrade. + Note: While the upgrade is blocked changes made to the Cluster Topology will be delayed propagating to the underlying objects while the object is waiting for upgrade. Example: modifying ControlPlane/MachineDeployments (think scale up), or creating new MachineDeployments will be delayed until the target ControlPlane/MachineDeployment is ready to pick up the upgrade. -This ensures that the ControlPlane and MachineDeployments do not perform a rollout prematurely while waiting to be rolled out again for the version upgrade (no double rollouts). +This ensures that the ControlPlane and MachineDeployments do not perform a rollout prematurely while waiting to be rolled out again +for the version upgrade (no double rollouts). This also ensures that any version specific changes are only pushed to the underlying objects also at the correct version. -#### Example Request: +Example Request: ```yaml apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 kind: BeforeClusterUpgradeRequest settings: +cluster: + apiVersion: cluster.x-k8s.io/v1beta1 + kind: Cluster + metadata: + name: test-cluster + namespace: test-ns + spec: + ... + status: + ... +fromKubernetesVersion: "v1.30.0" +toKubernetesVersion: "v1.33.0" +controlPlaneUpgrades: + - version: v1.31.0 + - version: v1.32.3 + - version: v1.33.0 +workersUpgrades: + - version: v1.32.3 + - version: v1.33.0 +``` + +Note: The `controlPlaneUpgrades` and the `workersUpgrades` fields contains the intermediate steps to reach the target version, +which is also included in the list. + +Example Response: + +```yaml +apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 +kind: BeforeClusterUpgradeResponse +status: Success # or Failure +message: "error message if status == Failure" +retryAfterSeconds: 10 +``` + +### BeforeControlPlaneUpgrade + +This hook is called before a new version is propagated to the control plane object, which happens as many times +as defined by the upgrade plan. + +Runtime Extension implementers can use this hook to execute pre-upgrade add-on tasks and block upgrades of the ControlPlane. + +Note: +- When an upgrade is starting, `BeforeControlPlaneUpgrade` will be called after `BeforeClusterUpgrade` is completed. +- When an upgrade is in progress `BeforeControlPlaneUpgrade` will be called for each intermediate version that will + be applied to the control plane (instead `BeforeClusterUpgrade` will be called only once at the beginning of the upgrade). + +Example Request: + +```yaml +apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 +kind: BeforeControlPlaneUpgradeRequest +settings: cluster: apiVersion: cluster.x-k8s.io/v1beta1 kind: Cluster @@ -138,30 +211,43 @@ cluster: ... status: ... -fromKubernetesVersion: "v1.21.2" -toKubernetesVersion: "v1.22.0" +fromKubernetesVersion: "v1.30.0" +toKubernetesVersion: "v1.33.0" +controlPlaneUpgrades: + - version: v1.31.0 + - version: v1.32.3 + - version: v1.33.0 +workersUpgrades: + - version: v1.32.3 + - version: v1.33.0 ``` -#### Example Response: +Note: The `controlPlaneUpgrades` and the `workersUpgrades` fields contains the intermediate steps to reach the target version, +which is also included in the list. + +Example Response: ```yaml apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 -kind: BeforeClusterUpgradeResponse +kind: BeforeControlPlaneUpgradeResponse status: Success # or Failure message: "error message if status == Failure" retryAfterSeconds: 10 ``` -(*) Under normal circumstances `spec.topology.version` gets propagated to the control plane immediately; however - if previous upgrades or worker machine rollouts are still in progress, the system waits for those operations - to complete before starting the new upgrade. - ### AfterControlPlaneUpgrade -This hook is called after the entire control plane has been upgraded to the version specified in `spec.topology.version`, -and immediately before the new version is going to be propagated to the MachineDeployments of the Cluster. -Runtime Extension implementers can use this hook to execute post-upgrade add-on tasks and block upgrades to workers -until everything is ready. +This hook is called after the control plane has been upgraded to the version specified in `spec.topology.version` +or to an intermediate version in the upgrade plan and: +- if workers upgrade can be skipped for this version and this is an intermediate version of an upgrade plan, + immediately before calling the `BeforeControlPlaneUpgrade` hook for the next version in the upgrade plane. +- if workers upgrade must be performed for this version, + immediately before calling the `BeforeWorkersUpgrade` hook for the same version. +- if the cluster does not have workers and this is the last version of an upgrade plan, + immediately before calling the `AfterClusterUpgrade` hook. + +Runtime Extension implementers can use this hook to execute post-upgrade add-on tasks and block upgrades to the next +version of the control plane or to workers until everything is ready. Note: While the MachineDeployments upgrade is blocked changes made to existing MachineDeployments and creating new MachineDeployments will be delayed while the object is waiting for upgrade. Example: modifying MachineDeployments (think scale up), @@ -169,12 +255,60 @@ or creating new MachineDeployments will be delayed until the target MachineDeplo This ensures that the MachineDeployments do not perform a rollout prematurely while waiting to be rolled out again for the version upgrade (no double rollouts). This also ensures that any version specific changes are only pushed to the underlying objects also at the correct version. -#### Example Request: +Example Request: ```yaml apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 kind: AfterControlPlaneUpgradeRequest settings: +cluster: + apiVersion: cluster.x-k8s.io/v1beta1 + kind: Cluster + metadata: + name: test-cluster + namespace: test-ns + spec: + ... + status: + ... +kubernetesVersion: "v1.30.0" +controlPlaneUpgrades: + - version: v1.31.0 + - version: v1.32.3 + - version: v1.33.0 +workersUpgrades: + - version: v1.32.3 + - version: v1.33.0 +``` + +Note: The `controlPlaneUpgrades` and the `workersUpgrades` fields contains the intermediate steps to reach the target version, +which is also included in the list. + +Example Response: + +```yaml +apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 +kind: AfterControlPlaneUpgradeResponse +status: Success # or Failure +message: "error message if status == Failure" +retryAfterSeconds: 10 +``` + +### BeforeWorkersUpgrade + +This hook is called before a new version is propagated to workers. Runtime Extension implementers +can use this hook to execute pre-upgrade add-on tasks and block upgrades of Workers. + +Note: +- This hook will be called only if workers upgrade must be performed for an intermediate version of a chained upgrade + or when upgrading to the target `spec.topology.version`. + +Example Request: + +```yaml +apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 +kind: BeforeWorkersUpgradeRequest +settings: cluster: apiVersion: cluster.x-k8s.io/v1beta1 kind: Cluster @@ -185,14 +319,73 @@ cluster: ... status: ... -kubernetesVersion: "v1.22.0" +fromKubernetesVersion: "v1.30.0" +toKubernetesVersion: "v1.33.0" +controlPlaneUpgrades: + - version: v1.31.0 + - version: v1.32.3 + - version: v1.33.0 +workersUpgrades: + - version: v1.32.3 + - version: v1.33.0 ``` -#### Example Response: +Note: The `controlPlaneUpgrades` and the `workersUpgrades` fields contains the intermediate steps to reach the target version, +which is also included in the list. + +Example Response: ```yaml apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 -kind: AfterControlPlaneUpgradeResponse +kind: BeforeWorkersUpgradeResponse +status: Success # or Failure +message: "error message if status == Failure" +retryAfterSeconds: 10 +``` + +### AfterWorkersUpgrade + +This hook is called after all the workers have been upgraded to the version specified in `spec.topology.version` +or to an intermediate version in the upgrade plan, and: +- if the upgrade plan is completed and the entire cluster is at `spec.topology.version`, immediately before calling the `AfterClusterUpgrade` hook. +- if the upgrade plan is not complete and the entire cluster is now at one of the intermediate versions, immediately before + calling `BeforeControlPlaneUpgrade` hook for the next intermediate step; in this case, the hook will ensure the control + can't to move to the next version in the upgrade plan until `AfterWorkersUpgrade` is completed. + +Example Request: + +```yaml +apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 +kind: AfterWorkersUpgradeRequest +settings: +cluster: + apiVersion: cluster.x-k8s.io/v1beta1 + kind: Cluster + metadata: + name: test-cluster + namespace: test-ns + spec: + ... + status: + ... +kubernetesVersion: "v1.30.0" +controlPlaneUpgrades: + - version: v1.31.0 + - version: v1.32.3 + - version: v1.33.0 +workersUpgrades: + - version: v1.32.3 + - version: v1.33.0 +``` + +Note: The `controlPlaneUpgrades` and the `workersUpgrades` fields contains the intermediate steps to reach the target version, +which is also included in the list. + +Example Response: + +```yaml +apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 +kind: AfterWorkersUpgradeResponse status: Success # or Failure message: "error message if status == Failure" retryAfterSeconds: 10 @@ -202,9 +395,9 @@ retryAfterSeconds: 10 This hook is called after the Cluster, control plane and workers have been upgraded to the version specified in `spec.topology.version`. Runtime Extensions implementers can use this hook to execute post-upgrade add-on tasks. -This hook does not block any further changes or upgrades to the Cluster. +This hook blocks new upgrades to start until it is completed. -#### Example Request: +Example Request: ```yaml apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 @@ -223,13 +416,14 @@ cluster: kubernetesVersion: "v1.22.0" ``` -#### Example Response: +Example Response: ```yaml apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 kind: AfterClusterUpgradeResponse status: Success # or Failure message: "error message if status == Failure" +retryAfterSeconds: 10 ``` ### BeforeClusterDelete @@ -238,7 +432,7 @@ This hook is called after the Cluster deletion has been triggered by the user an of the Cluster is going to be deleted. Runtime Extension implementers can use this hook to execute cleanup tasks for the add-ons and block deletion of the Cluster and descendant objects until everything is ready. -#### Example Request: +Example Request: ```yaml apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 @@ -256,7 +450,7 @@ cluster: ... ``` -#### Example Response: +Example Response: ```yaml apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 diff --git a/docs/book/src/tasks/experimental-features/runtime-sdk/implement-topology-mutation-hook.md b/docs/book/src/tasks/experimental-features/runtime-sdk/implement-topology-mutation-hook.md index 0b4226269d39..943787639380 100644 --- a/docs/book/src/tasks/experimental-features/runtime-sdk/implement-topology-mutation-hook.md +++ b/docs/book/src/tasks/experimental-features/runtime-sdk/implement-topology-mutation-hook.md @@ -25,6 +25,50 @@ Three different hooks are called as part of Topology Mutation - two in the Clust Please see the corresponding [CAEP](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220330-topology-mutation-hook.md) for additional background information. + +* [Implementing Topology Mutation Hook Runtime Extensions](#implementing-topology-mutation-hook-runtime-extensions) + * [Introduction](#introduction) + * [Guidelines](#guidelines) + * [Definitions](#definitions) + * [Inline vs. external patches](#inline-vs-external-patches) + * [External variable definitions](#external-variable-definitions) + * [External variable discovery in the ClusterClass](#external-variable-discovery-in-the-clusterclass) + * [Variable definition conflicts](#variable-definition-conflicts) + * [Setting values for variables in the Cluster](#setting-values-for-variables-in-the-cluster) + * [Using one or multiple external patch extensions](#using-one-or-multiple-external-patch-extensions) + * [Guidelines](#guidelines-1) + * [Patch extension guidelines](#patch-extension-guidelines) + * [Variable discovery guidelines](#variable-discovery-guidelines) + * [Definitions](#definitions-1) + * [GeneratePatches](#generatepatches) + * [ValidateTopology](#validatetopology) + * [DiscoverVariables](#discovervariables) + * [Dealing with Cluster API upgrades with apiVersion bumps](#dealing-with-cluster-api-upgrades-with-apiversion-bumps) + + +## Guidelines + +All guidelines defined in [Implementing Runtime Extensions](implement-extensions.md#guidelines) apply to the +implementation of Runtime Extensions for topology mutation hooks as well. + +In summary, Runtime Extensions are components that should be designed, written and deployed with great caution given +that they can affect the proper functioning of the Cluster API runtime. A poorly implemented Runtime Extension could +potentially block topology reconcile from happening. + +Following recommendations are especially relevant: + +* [Idempotence](implement-extensions.md#idempotence) +* [Avoid side effects](implement-extensions.md#side-effects) +* [Deterministic result](implement-extensions.md#deterministic-result) +* [Error messages](implement-extensions.md#error-messages) +* [Error management](implement-extensions.md#error-management) +* [Avoid dependencies](implement-extensions.md#avoid-dependencies) + +## Definitions + +For additional details about the OpenAPI spec of the topology mutation hooks, please download the [`runtime-sdk-openapi.yaml`]({{#releaselink repo:"https://github.com/kubernetes-sigs/cluster-api" gomodule:"sigs.k8s.io/cluster-api" asset:"runtime-sdk-openapi.yaml" version:"1.11.x"}}) +file and then open it from the [Swagger UI](https://editor.swagger.io/). + ## Inline vs. external patches Inline patches have the following advantages: @@ -157,6 +201,15 @@ Some considerations: * [Conway's law](https://en.wikipedia.org/wiki/Conway%27s_law) might make it not feasible in large organizations to use a single extension. In those cases it's important that boundaries between extensions are clearly defined. + + ## Guidelines For general Runtime Extension developer guidelines please refer to the guidelines in [Implementing Runtime Extensions](implement-extensions.md#guidelines). @@ -199,7 +252,7 @@ so ClusterClass authors can evaluate impacts of changes before performing an upg A GeneratePatches call generates patches for the entire Cluster topology. Accordingly the request contains all templates, the global variables and the template-specific variables. The response contains generated patches. -#### Example request: +Example request: * Generating patches for a Cluster topology is done via a single call to allow External Patch Extensions a holistic view of the entire Cluster topology. Additionally this allows us to reduce the number of round-trips. @@ -233,7 +286,7 @@ items: ... ``` -#### Example Response: +Example Response: * The response contains patches instead of full objects to reduce the payload. * Templates in the request and patches in the response will be correlated via UIDs. @@ -250,8 +303,6 @@ items: patch: ``` -For additional details, you can see the full schema in . - We are considering to introduce a library to facilitate development of External Patch Extensions. It would provide capabilities like: * Accessing builtin variables * Extracting certain templates from a GeneratePatches request (e.g. all bootstrap templates) @@ -265,7 +316,7 @@ A ValidateTopology call validates the topology after all patches have been appli templates of the Cluster topology, the global variables and the template-specific variables. The response contains the result of the validation. -#### Example Request: +Example Request: * The request is the same as the GeneratePatches request except it doesn't have `uid` fields. We don't need them as we don't have to correlate patches in the response. @@ -296,7 +347,7 @@ items: ... ``` -#### Example Response: +Example Response: ```yaml apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 @@ -305,21 +356,11 @@ status: Success # or Failure message: "error message if status == Failure" ``` -For additional details, you can see the full schema in . - - - ### DiscoverVariables A DiscoverVariables call returns definitions for one or more variables. -#### Example Request: +Example Request: * The request is a simple call to the Runtime hook. @@ -329,7 +370,7 @@ kind: DiscoverVariablesRequest settings: ``` -#### Example Response: +Example Response: ```yaml apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 @@ -379,17 +420,6 @@ variables: ... ``` -For additional details, you can see the full schema in . -TODO: Add openAPI definition to the SwaggerUI - - - ## Dealing with Cluster API upgrades with apiVersion bumps There are some special considerations regarding Cluster API upgrades when the upgrade includes a bump diff --git a/docs/book/src/tasks/experimental-features/runtime-sdk/implement-upgrade-plan-hooks.md b/docs/book/src/tasks/experimental-features/runtime-sdk/implement-upgrade-plan-hooks.md new file mode 100644 index 000000000000..725e649e707d --- /dev/null +++ b/docs/book/src/tasks/experimental-features/runtime-sdk/implement-upgrade-plan-hooks.md @@ -0,0 +1,171 @@ +# Implementing Upgrade Plan Runtime Extensions + + + +## Introduction + +The proposal for [Chained and efficient upgrades](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/20250513-chained-and-efficient-upgrades-for-clusters-with-managed-topologies.md) +introduced support for upgrading by more than one minor when working with Clusters using managed topologies. + +According to the proposal, there are two ways to provide to Cluster API the information required to compute the upgrade plan: +- By setting the list of versions in the `spec.kubernetesVersions` field in the `ClusterClass` object. +- By calling the runtime hook defined in the `spec.upgrade` in the `ClusterClass` object. + +This document defines the hook for the second option and provides recommendation on how to implement it. + + +* [Implementing Upgrade Plan Runtime Extensions](#implementing-upgrade-plan-runtime-extensions) + * [Introduction](#introduction) + * [Guidelines](#guidelines) + * [Definitions](#definitions) + * [GenerateUpgradePlan](#generateupgradeplan) + + +## Guidelines + +All guidelines defined in [Implementing Runtime Extensions](implement-extensions.md#guidelines) apply to the +implementation of Runtime Extensions for upgrade plan hooks as well. + +In summary, Runtime Extensions are components that should be designed, written and deployed with great caution given +that they can affect the proper functioning of the Cluster API runtime. A poorly implemented Runtime Extension could +potentially block upgrade transitions from happening. + +Following recommendations are especially relevant: + +* [Idempotence](implement-extensions.md#idempotence) +* [Deterministic result](implement-extensions.md#deterministic-result) +* [Error messages](implement-extensions.md#error-messages) +* [Error management](implement-extensions.md#error-management) +* [Avoid dependencies](implement-extensions.md#avoid-dependencies) + +## Definitions + +For additional details about the OpenAPI spec of the upgrade plan hooks, please download the [`runtime-sdk-openapi.yaml`]({{#releaselink repo:"https://github.com/kubernetes-sigs/cluster-api" gomodule:"sigs.k8s.io/cluster-api" asset:"runtime-sdk-openapi.yaml" version:"1.11.x"}}) +file and then open it from the [Swagger UI](https://editor.swagger.io/). + +### GenerateUpgradePlan + +The GenerateUpgradePlan hook is called every time Cluster API is required to compute the upgrade plan. + +Notably, during an upgrade, the upgrade plan is recomputed several times, ideally one each time the upgrade plan completes +a step, but the number of calls might be higher depending on e.g. by the duration of the upgrade. + +Example Request: + +```yaml +apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 +kind: GenerateUpgradePlanRequest +settings: +cluster: + apiVersion: cluster.x-k8s.io/v1beta1 + kind: Cluster + metadata: + name: test-cluster + namespace: test-ns + spec: + ... + status: + ... +fromKubernetesVersion: "v1.29.0" +toKubernetesVersion: "v1.33.0" +``` + +Example Response: + +```yaml +apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 +kind: GenerateUpgradePlanResponse +status: Success # or Failure +message: "error message if status == Failure" +controlPlaneUpgrades: +- version: v1.30.0 +- version: v1.31.0 +- version: v1.32.3 +- version: v1.33.0 + ``` + +Note: in this case the system will infer the list of intermediate version for workers from the list of control plane versions, taking +care of performing the minimum number of workers upgrade by taking into account the [Kubernetes version skew policy](https://kubernetes.io/releases/version-skew-policy/). + +Implementers of this runtime extension can also address more sophisticated use cases by computing the response in different ways, e.g. + +- Go through more patch release for a minor if necessary, e.g., v1.30.0 -> v1.30.1 -> etc. + + ```yaml + ... + controlPlaneUpgrades: + - version: v1.30.0 + - version: v1.30.1 + - ... + ``` + +Note: in this case the system will infer the list of intermediate version for workers from the list of control plane versions, taking +care of performing the minimum number of workers upgrade by taking into account the [Kubernetes version skew policy](https://kubernetes.io/releases/version-skew-policy/). + +- Force workers to upgrade to specific versions, e.g., force workers upgrade to v1.30.0 when doing v1.29.0 -> v1.32.3 + (in this example, worker upgrade to 1.30.0 is not required by the [Kubernetes version skew policy](https://kubernetes.io/releases/version-skew-policy/), so it would + be skipped under normal circumstances). + + ```yaml + ... + controlPlaneUpgrades: + - version: v1.30.0 + - version: v1.31.0 + - version: v1.32.3 + workersUpgrades: + - version: v1.30.0 + - version: v1.32.3 + ``` + +Note: in this case the system will take into consideration the provided `workersUpgrades`, and validated it is +consistent with `controlPlaneUpgrades` and also compliant with the [Kubernetes version skew policy](https://kubernetes.io/releases/version-skew-policy/). + +- Force workers to upgrade to all the intermediate steps (opt out from efficient upgrades). + + ```yaml + ... + controlPlaneUpgrades: + - version: v1.30.0 + - version: v1.31.0 + - version: v1.32.3 + workersUpgrades: + - version: v1.30.0 + - version: v1.31.0 + - version: v1.32.3 + ``` + +Note: in this case the system will take into consideration the provided `workersUpgrades`, and validated it is +consistent with `controlPlaneUpgrades` and also compliant with the [Kubernetes version skew policy](https://kubernetes.io/releases/version-skew-policy/). + +In all the cases above, the `GenerateUpgradePlanResponse` content must comply the following validation rules: + +- `controlPlaneUpgrades` is the list of version upgrade steps for the control plane; it must be always specified + unless the control plane is already at the target version. + - there should be at least one version for every minor between `fromControlPlaneKubernetesVersion` (excluded) and `toKubernetesVersion` (included). + - each version must be: + - greater than `fromControlPlaneKubernetesVersion` (or with a different build number) + - greater than the previous version in the list (or with a different build number) + - less or equal to `toKubernetesVersion` (or with a different build number) + - the last version in the plan must be equal to `toKubernetesVersion` + +- `workersUpgrades` is the list of version upgrade steps for the workers. + - In case the upgrade plan for workers will be left to empty, the system will automatically + determine the minimal number of workers upgrade steps, thus minimizing impact on workloads and reducing + the overall upgrade time. + - If instead for any reason a custom upgrade plan for workers is required, `workersUpgrades` should be set and + the following rules apply to each version in the list. More specifically, each version must be: + - equal to `fromControlPlaneKubernetesVersion` or to one of the versions in the control plane upgrade plan. + - greater than `fromWorkersKubernetesVersion` (or with a different build number) + - greater than the previous version in the list (or with a different build number) + - less or equal to the `toKubernetesVersion` (or with a different build number) + - in case of versions with the same major/minor/patch version but different build number, also the order + of those versions must be the same for control plane and worker upgrade plan. + - the last version in the plan must be equal to `toKubernetesVersion` + - the upgrade plane must have all the intermediate version which workers must go through to avoid breaking rules + defining the max version skew between control plane and workers. diff --git a/docs/book/src/tasks/experimental-features/runtime-sdk/index.md b/docs/book/src/tasks/experimental-features/runtime-sdk/index.md index e9185491be03..b048be38532b 100644 --- a/docs/book/src/tasks/experimental-features/runtime-sdk/index.md +++ b/docs/book/src/tasks/experimental-features/runtime-sdk/index.md @@ -31,5 +31,6 @@ Additional documentation: * [Implementing Runtime Extensions](./implement-extensions.md) * [Implementing Lifecycle Hook Extensions](./implement-lifecycle-hooks.md) * [Implementing Topology Mutation Hook Extensions](./implement-topology-mutation-hook.md) + * [Implementing Upgrade Plan Runtime Extensions](./implement-upgrade-plan-hooks.md) * For Cluster operators: * [Deploying Runtime Extensions](./deploy-runtime-extension.md) diff --git a/docs/proposals/20250513-chained-and-efficient-upgrades-for-clusters-with-managed-topologies.md b/docs/proposals/20250513-chained-and-efficient-upgrades-for-clusters-with-managed-topologies.md index 39fbffbc55af..9e3479dc150d 100644 --- a/docs/proposals/20250513-chained-and-efficient-upgrades-for-clusters-with-managed-topologies.md +++ b/docs/proposals/20250513-chained-and-efficient-upgrades-for-clusters-with-managed-topologies.md @@ -24,7 +24,6 @@ see-also: * [Summary](#summary) * [Motivation](#motivation) * [Goals](#goals) - * [Future Work](#future-work) * [Non-Goals](#non-goals) * [Proposal](#proposal) * [User Stories](#user-stories) @@ -83,10 +82,6 @@ When using clusters with managed topologies: - Automatically perform chained upgrades in an efficient way by skipping workers upgrades whenever possible. - Allow Cluster API users to influence the upgrade plan e.g. based on availability of machines images for the intermediate versions. -### Future Work - -- Consider if and how to allow users to change the desired version while a chained upgrade is in progress. - ### Non-Goals - Support Kubernetes version downgrades. @@ -361,7 +356,14 @@ That means that by doing any rollout, e.g. due to an automatic machine remediati fact that the system can successfully perform an upgrade, or you get the chance to detect and fix issues in the system before a full upgrade is performed. -Conversely, risk increases for users not performing any form of rollouts for long periods. +Conversely, risk increases for users not performing any form of rollouts for long periods, or for users artificially +extending the upgrade duration for a long time by using (or abusing) lifecycle hooks. + +The second point might become critical, depending on the complexity of operations that the users performs while the +upgrade is blocked. + +With this regard, the recommendation is to keep the upgrade workflow as simple and as fast as possible, +e.g. combining application upgrades and Kubernetes version upgrade in a single workflow should be avoided. - Upgrading a Cluster by multiple Kubernetes minor versions might compromise workloads. @@ -383,7 +385,7 @@ how a managed topology should behave. ## Upgrade Strategy No particular upgrade considerations are required, this feature will be available to users upgrading to -Cluster API v1.11. +Cluster API v1.12. However, it is required to enhance ClusterClasses with the information required to compute upgrade plans, otherwise the system will keep supporting only upgrade to the next minor for the corresponding clusters (opt-in). @@ -422,6 +424,7 @@ with managed topologies. ## Implementation History -- [ ] 05/05/2023: Proposed idea in https://github.com/kubernetes-sigs/cluster-api/issues/8616 -- [ ] 05/07/2025: Presented proposal at a community meeting -- [ ] 05/13/2025: Open proposal PR +- [x] 05/05/2023: Proposed idea in https://github.com/kubernetes-sigs/cluster-api/issues/8616 +- [x] 05/07/2025: Presented proposal at a community meeting +- [x] 05/13/2025: Open proposal PR +- [x] 12/01/2025: Update the proposal to reflect changes during the implementation