Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/book/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@
- [Operating a managed Cluster](./tasks/experimental-features/cluster-class/operate-cluster.md)
- [Runtime SDK](tasks/experimental-features/runtime-sdk/index.md)
- [Implementing Runtime Extensions](./tasks/experimental-features/runtime-sdk/implement-extensions.md)
- [Implementing In-Place Update Hooks Extensions](./tasks/experimental-features/runtime-sdk/implement-in-place-update-hooks.md)
- [Implementing Lifecycle Hook Extensions](./tasks/experimental-features/runtime-sdk/implement-lifecycle-hooks.md)
- [Implementing Topology Mutation Hook Extensions](./tasks/experimental-features/runtime-sdk/implement-topology-mutation-hook.md)
- [Implementing Upgrade Plan Runtime Extensions](./tasks/experimental-features/runtime-sdk/implement-upgrade-plan-hooks.md)
Expand Down
32 changes: 30 additions & 2 deletions docs/book/src/developer/providers/contracts/control-plane.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ repo or add an item to the agenda in the [Cluster API community meeting](https:/
| [ControlPlane: version] | No | Mandatory if control plane allows direct management of the Kubernetes version in use; Mandatory for cluster class support. |
| [ControlPlane: machines] | No | Mandatory if control plane instances are represented with a set of Cluster API Machines. |
| [ControlPlane: initialization completed] | Yes | |
| [ControlPlane: in-place updates] | No | Only supported for control plane providers with control plane machines |
| [ControlPlane: conditions] | No | |
| [ControlPlane: terminal failures] | No | |
| [ControlPlaneTemplate, ControlPlaneTemplateList resource definition] | No | Mandatory for ClusterClasses support |
Expand Down Expand Up @@ -616,8 +617,34 @@ the ControlPlane resource will be ignored.

</aside>

### ControlPlane: conditions
### ControlPlane: in-place updates

In case a control plane provider would like to provide support for in-place updates, please check the [proposal](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/20240807-in-place-updates.md).

Supporting in-place updates requires:
- implementing the call for the registered `CanUpdateMachine` hook when performing the "can update in-place" decision.
- when it is decided to perform the in-place decision:
- the machine spec must be updated to the desired state, as well as the spec for the corresponding infrastructure machine and bootstrap config
- while updating those objects also the `in-place-updates.internal.cluster.x-k8s.io/update-in-progress` annotation must be set
- once all objects are updated the `UpdateMachine` hook must be set as pending on the machine object

After above steps are completed, the machine controller will take over and complete the in-place upgrade.

<aside class="note warning">

<h1>High complexity</h1>

Implementing the in-place update transition in a race condition-free, re-entrant way is more complex than it might seem.

Please read the proposal's [implementation notes](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20240807-in-place-updates-implementation-notes.md)
carefully.

Also, it is highly recommended to use the KCP implementation as a reference.

</aside>


### ControlPlane: conditions

According to [Kubernetes API Conventions], Conditions provide a standard mechanism for higher-level
status reporting from a controller.
Expand Down Expand Up @@ -873,7 +900,8 @@ is implemented in ControlPlane controllers:
[ControlPlane: machines]: #controlplane-machines
[In place propagation of changes affecting Kubernetes objects only]: https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20221003-In-place-propagation-of-Kubernetes-objects-only-changes.md
[ControlPlane: version]: #controlplane-version
[ControlPlane: initialization completed]: #controlplane-initialization-completed
[ControlPlane: initialization completed]: #controlplane-initialization-completed
[ControlPlane: in-place updates]: #controlplane-in-place-updates
[ControlPlane: conditions]: #controlplane-conditions
[Kubernetes API Conventions]: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties
[Improving status in CAPI resources]: https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20240916-improve-status-in-CAPI-resources.md
Expand Down
16 changes: 16 additions & 0 deletions docs/book/src/reference/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -281,6 +281,12 @@ are propagated in place by CAPI controllers to avoid the more elaborated mechani
They include metadata, MinReadySeconds, NodeDrainTimeout, NodeVolumeDetachTimeout and NodeDeletionTimeout but are
not limited to be expanded in the future.

### In-place update

Any change to a Machine spec, that is performed without deleting the machines and creating a new one.

Note: changing [in-place mutable fields](#in-place-mutable-fields) is not considered an in-place upgrade.

### Instance

see [Server](#server)
Expand All @@ -289,6 +295,8 @@ see [Server](#server)

A resource that does not mutate. In Kubernetes we often state the instance of a running pod is immutable or does not change once it is run. In order to make a change, a new pod is run. In the context of [Cluster API](#cluster-api) we often refer to a running instance of a [Machine](#machine) as being immutable, from a [Cluster API](#cluster-api) perspective.

Note: Cluster API also have extensibility points that make it possible to perform [in-place updates](#in-place-update) of machines.

### IPAM provider

Refers to a [provider](#provider) that allows Cluster API to interact with IPAM solutions.
Expand Down Expand Up @@ -480,6 +488,14 @@ See [Topology Mutation](../tasks/experimental-features/runtime-sdk/implement-top
# U
---

### Update Extension

A [runtime extension provider](#runtime-extension-provider) that implements [Update Lifecycle Hooks](#update-lifecycle-hooks).

### Update Lifecycle Hooks
Is a set of Cluster API [Runtime Hooks](#runtime-hook) called when performing the "can update in-place" decision or
when performing an [in-place update](#in-place-update).

### Upgrade plan
The sequence of intermediate versions ... target version that a Cluster must upgrade to when
performing a [chained upgrade](#chained-upgrade).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@ temporary location for features which will be moved to their permanent locations

Currently Cluster API has the following experimental features:
* `ClusterTopology` (env var: `CLUSTER_TOPOLOGY`): [ClusterClass](./cluster-class/index.md)
* `InPlaceUpdates` (env var: `EXP_IN_PLACE_UPDATES`):
* Allows users to execute changes on existing machines without deleting the machines and creating a new one.
* See the [proposal](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/20240807-in-place-updates.md) for more details.
* `KubeadmBootstrapFormatIgnition` (env var: `EXP_KUBEADM_BOOTSTRAP_FORMAT_IGNITION`): [Ignition](./ignition.md)
* `MachinePool` (env var: `EXP_MACHINE_POOL`): [MachinePools](./machine-pools.md)
* `MachineSetPreflightChecks` (env var: `EXP_MACHINE_SET_PREFLIGHT_CHECKS`): [MachineSetPreflightChecks](./machineset-preflight-checks.md)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,279 @@
# Implementing in-place update hooks

<aside class="note warning">

<h1>Caution</h1>

Please note Runtime SDK is an advanced feature. If implemented incorrectly, a failing Runtime Extension can severely impact the Cluster API runtime.

</aside>

## Introduction

The proposal for [in-place updates in Cluster API](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/20240807-in-place-updates.md)
introduced extensions allowing users to execute changes on existing machines without deleting the machines and creating a new one.

Notably, the Cluster API user experience remain the same as of today no matter of the in-place update feature is enabled
or not e.g. in order to trigger a MachineDeployment rollout, you have to rotate a template, etc.

Users should care ONLY about the desired state (as of today).

Cluster API is responsible to choose the best strategy to achieve desired state, and with the introduction of
update extensions, Cluster API is expanding the set of tools that can be used to achieve the desired state.

If external update extensions can not cover the totality of the desired changes, CAPI will fall back to Cluster API’s default,
immutable rollouts.

Cluster API will be also responsible to determine which Machine/MachineSet should be updated, as well as to handle rollout
options like MaxSurge/MaxUnavailable. With this regard:

- Machines updating in-place are considered not available, because in-place updates are always considered as potentially disruptive.
- For control plane machines, if maxSurge is one, a new machine must be created first, then as soon as there is
“buffer” for in-place, in-place update can proceed.
- KCP will not use in-place in case it will detect that it can impact health of the control plane.
- For workers machines, if maxUnavailable is zero, a new machine must be created first, then as soon as there
is “buffer” for in-place, in-place update can proceed.
- When in-place is possible, the system should try to in-place update as many machines as possible.
In practice, this means that maxSurge might be not fully used (it is used only for scale up by one if maxUnavailable=0).
- No in-place updates are performed for workers machines when using rollout strategy `OnDelete`.

<aside class="note warning">

<h1>Important!</h1>

Cluster API will call the in-place extensions only if the `InPlaceUpdates` feature flag is enabled.

Also, please note that the current implementation of the [in-place updates proposal](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/20240807-in-place-updates.md) only allows registering one extension for the `CanUpdateMachine`, `CanUpdateMachineSet` and `UpdateMachine` hooks.

</aside>

<!-- TOC -->
* [Implementing in-place update hooks](#implementing-in-place-update-hooks)
* [Introduction](#introduction)
* [Guidelines](#guidelines)
* [Definitions](#definitions)
* [CanUpdateMachine](#canupdatemachine)
* [CanUpdateMachineSet](#canupdatemachineset)
* [UpdateMachine](#updatemachine)
<!-- TOC -->

## Guidelines

All guidelines defined in [Implementing Runtime Extensions](implement-extensions.md#guidelines) apply to the
implementation of Runtime Extensions for upgrade plan hooks as well.

In summary, Runtime Extensions are components that should be designed, written and deployed with great caution given
that they can affect the proper functioning of the Cluster API runtime. A poorly implemented Runtime Extension could
potentially block upgrade transitions from happening.

Following recommendations are especially relevant:

* [Timeouts](implement-extensions.md#timeouts)
* [Idempotence](implement-extensions.md#idempotence)
* [Deterministic result](implement-extensions.md#deterministic-result)
* [Error messages](implement-extensions.md#error-messages)
* [Error management](implement-extensions.md#error-management)
* [Avoid dependencies](implement-extensions.md#avoid-dependencies)

## Definitions

For additional details about the OpenAPI spec of the upgrade plan hooks, please download the [`runtime-sdk-openapi.yaml`]({{#releaselink repo:"https://github.com/kubernetes-sigs/cluster-api" gomodule:"sigs.k8s.io/cluster-api" asset:"runtime-sdk-openapi.yaml" version:"1.11.x"}})
file and then open it from the [Swagger UI](https://editor.swagger.io/).

### CanUpdateMachine

This hook is called by KCP when performing the "can update in-place" for a control plane machine.

Example request:

```yaml
apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
kind: CanUpdateMachineRequest
settings: <Runtime Extension settings>
current:
machine:
apiVersion: cluster.x-k8s.io/v1beta2
kind: Machine
metadata:
name: test-cluster
namespace: test-ns
spec:
...
infrastructureMachine:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereMachine
metadata:
name: test-cluster
namespace: test-ns
spec:
...
boostrapConfig:
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfig
metadata:
name: test-cluster
namespace: test-ns
spec:
...
desired:
machine:
...
infrastructureMachine:
...
boostrapConfig:
...
```

Note:
- All the objects will have the latest API version known by Cluster API.
- Only spec is provided, status fields are not included
- In a future release, when registering more than one extension for the `CanUpdateMachine` will be supported, the current state will already include changes that can be handled in-place by other runtime extensions.

Example Response:

```yaml
apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
kind: CanUpdateMachineResponse
status: Success # or Failure
message: "error message if status == Failure"
machinePatch:
patchType: JSONPatch
patch: <JSON-patch>
infrastructureMachinePatch:
...
boostrapConfigPatch:
...
```

Note:
- Extensions should return per-object patches to be applied on current objects to indicate which changes they can handle in-place.
- Only fields in Machine/InfraMachine/BootstrapConfig spec have to be covered by patches
- Patches must be in JSONPatch or JSONMergePatch format

### CanUpdateMachineSet

This hook is called by the MachineDeployment controller when performing the "can update in-place" for all the Machines controlled by
a MachineSet.

Example request:

```yaml
apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
kind: CanUpdateMachineSetRequest
settings: <Runtime Extension settings>
current:
machineSet:
apiVersion: cluster.x-k8s.io/v1beta2
kind: MachineSet
metadata:
name: test-cluster
namespace: test-ns
spec:
...
infrastructureMachineTemplate:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereMachineTemplate
metadata:
name: test-cluster
namespace: test-ns
spec:
...
boostrapConfigTemplate:
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
name: test-cluster
namespace: test-ns
spec:
...
desired:
machineSet:
...
infrastructureMachineTemplate:
...
boostrapConfigTemplate:
...
```

Note:
- All the objects will have the latest API version known by Cluster API.
- Only spec is provided, status fields are not included
- In a future release, when registering more than one extension for the `CanUpdateMachineSet` will be supported, the current state will already include changes that can be handled in-place by other runtime extensions.

Example Response:

```yaml
apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
kind: CanUpdateMachineSetResponse
status: Success # or Failure
message: "error message if status == Failure"
machineSetPatch:
patchType: JSONPatch
patch: <JSON-patch>
infrastructureMachineTemplatePatch:
...
boostrapConfigTemplatePatch:
...
```

Note:
- Extensions should return per-object patches to be applied on current objects to indicate which changes they can handle in-place.
- Only fields in Machine/InfraMachine/BootstrapConfig spec have to be covered by patches
- Patches must be in JSONPatch or JSONMergePatch format

### UpdateMachine

This hook is called by the Machine controller when performing the in-place updates for a Machine.

Example request:

```yaml
apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
kind: UpdateMachineRequest
settings: <Runtime Extension settings>
desired:
machine:
apiVersion: cluster.x-k8s.io/v1beta2
kind: Machine
metadata:
name: test-cluster
namespace: test-ns
spec:
...
infrastructureMachineTemplate:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereMachineTemplate
metadata:
name: test-cluster
namespace: test-ns
spec:
...
boostrapConfigTemplate:
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
name: test-cluster
namespace: test-ns
spec:
...
```

Note:
- Only desired is provided (the external updater extension should know current state of the Machine).
- Only spec is provided, status fields are not included

Example Response:

```yaml
apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
kind: UpdateMachineSetResponse
status: Success # or Failure
message: "error message if status == Failure"
retryAfterSeconds: 10
```

Note:
- The status of the update operation is determined by the CommonRetryResponse fields:
- Status=Success + RetryAfterSeconds > 0: update is in progress
- Status=Success + RetryAfterSeconds = 0: update completed successfully
- Status=Failure: update failed
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ Additional documentation:
* [Runtime Hooks for Add-on Management CAEP](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220414-runtime-hooks.md)
* For Runtime Extension developers:
* [Implementing Runtime Extensions](./implement-extensions.md)
* [Implementing In-Place Update Hooks Extensions](./implement-in-place-update-hooks.md)
* [Implementing Lifecycle Hook Extensions](./implement-lifecycle-hooks.md)
* [Implementing Topology Mutation Hook Extensions](./implement-topology-mutation-hook.md)
* [Implementing Upgrade Plan Runtime Extensions](./implement-upgrade-plan-hooks.md)
Expand Down
Loading