Skip to content

Commit 0d93824

Browse files
committed
first review update
1 parent ff2f0fa commit 0d93824

File tree

1 file changed

+37
-25
lines changed

1 file changed

+37
-25
lines changed

docs/proposals/20250513-propogate-taints.md

Lines changed: 37 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ This proposal introduces taints as a first-class citizen in Cluster API's core t
7272
The proposal defines two different kind of propagation modes for taints:
7373

7474
- **Always**: Taints are continuously reconciled and maintained on nodes
75-
- **Initialize**: Taints are set once during node initialization and then left unmanaged
75+
- **OnInitialization**: Taints are set once during node initialization and then left unmanaged
7676

7777
NOTE: This new proposal has been created rather than updating the prior [in-place metadata propagation](20221003-In-place-propagation-of-Kubernetes-objects-only-changes.md) proposal because taints are not yet part of the Core Provider's API types and are different enough from labels or annotations that a different set of constraints will need to be considered.
7878
Very early versions of Kubernetes tracked taints as annotations, but they have long since been [promoted to their own API type](https://github.com/kubernetes/kubernetes/commit/9b640838a5f5e28db1c1f084afa393fa0b6d1166)
@@ -156,16 +156,16 @@ It also means that Server-Side Apply ownership rules could not be applied to ind
156156

157157
For Cluster API to support propagating `Taint`s, it will need to:
158158

159-
- implement its own mechanism for tracking what `Taint`s it owns.
159+
- implement its own mechanism for tracking what `Taint`s it owns. This will be similar to the implementation for annotations and labels. See [Changes to the Machine and MachinePool controllers](#changes-to-the-machine-and-machinepool-controllers) for more details.
160160
- ensure there are no conflicts with other actors by always setting the `metadata.resourceVersion` on API calls changing the taints on a Node.
161161

162162
#### Propagation of taints
163163

164164
A taint for a Node may be defined for different use-cases:
165165

166-
- Taint's supposed to stay on the Node to ensure only certain workload runs on such a `Node` aka. `Always`:
167-
- This taint are supposed to be set on the `Node` object as long as it is defined on its parent core CAPI object.
168-
- Example: Nodes where only run GPU related workload should run
166+
- Taints supposed to stay on the Node to ensure only certain workload runs on such a `Node` aka. `Always`:
167+
- These taints are supposed to be set on the `Node` object as long as it is defined on its parent core CAPI object.
168+
- Example: Nodes where only GPU related workload should run
169169
- CAPI controllers should ensure via reconciliation that such a taint gets added again in case a third party removes it.
170170
- Taint's supposed to get added once to a Node aka. `Initialize`
171171
- This taints are supposed to be set **once** on the `Node` object.
@@ -200,7 +200,7 @@ type MachineTaint struct {
200200

201201
// propagation defines how this taint should be propagated to Nodes.
202202
// Always: The taint will be continuously reconciled. If removed from a node, it will be re-added.
203-
// Initialize: The taint will be added during node initialization. If removed, it will not be re-added.
203+
// OnInitialization: The taint will be added during node initialization. If removed, it will not be re-added.
204204
// +required
205205
Propagation MachineTaintPropagation `json:"propagation"`
206206
}
@@ -214,23 +214,14 @@ const (
214214
// If removed, it will be re-added during reconciliation.
215215
TaintPropagationAlways MachineTaintPropagation = "Always"
216216

217-
// TaintPropagationInitialize means the taint should be set once during initialization and then
217+
// TaintPropagationOnInitialization means the taint should be set once during initialization and then
218218
// left alone. If removed, it will not be re-added.
219-
TaintPropagationInitialize MachineTaintPropagation = "Initialize"
219+
TaintPropagationOnInitialization MachineTaintPropagation = "OnInitialization"
220220
)
221221
```
222222

223-
Proper validations on the new field should ensure that no taint with a key of `node.cluster.x-k8s.io/uninitialized` is getting added, because that one is restricted to be used by CAPI and providers.
224-
225-
Open question: alternative to consider, e.g. specific struct, (negative: not as extensible):
226-
* example
227-
```golang
228-
type MachineTaints struct{
229-
Always []corev1.Taint `json:"always,omitempty"`
230-
Initialization []corev1.Taint `json:"initialization,omitempty"`
231-
}
232-
```
233-
* FIXME(chrischdi): if decided this should move down to alternative approaches.
223+
Proper validations on the new field should ensure that no taint with a key of `node.cluster.x-k8s.io/uninitialized` or `node.cluster.x-k8s.io/outdated-revision` is getting added, because that one is restricted to be used by CAPI and providers.
224+
Other taints normally set by kubeadm should be able to get set.
234225

235226
##### Changes to the Machine, MachineSet, MachineDeployment and MachinePool resources via MachineSpec
236227

@@ -258,7 +249,6 @@ type MachineSpec struct{
258249
// +optional
259250
// +listType=map
260251
// +listMapKey=key
261-
// +listMapKey=value
262252
// +listMapKey=effect
263253
Taints []MachineTaint `json:"taints,omitempty"`
264254

@@ -285,7 +275,7 @@ The propagation of the fields should follow the prior art and is summarized in t
285275

286276
![propagation of taints across a topology Cluster](./images/propagate-taints/topology-propagation.excalidraw.png)
287277

288-
As implemented for the ReadinessGates, add taints on a Cluster should lead to only adding the taints from the ClusterClass.
278+
As implemented for the ReadinessGates, add taints on a Cluster should lead to only adding the taints from the Cluster.
289279

290280
```golang
291281
type ControlPlaneClass struct {
@@ -361,6 +351,12 @@ The MachineDeployment and MachineSet controllers will watch their associated `Ma
361351

362352
This should be done similar to how the existing in-place mutable fields like `ReadinessGates`, `NodeDrainTimeout`, etc. for a Machine are handled today.
363353

354+
##### Changes to the cluster-autoscaler
355+
356+
The cluster-autoscaler implementation for CAPI as of today consumes the `capacity.cluster-autoscaler.kubernetes.io/taints` (see [Pre-defined labels and taints on nodes scaled from zero](https://cluster-api.sigs.k8s.io/tasks/automated-machine-management/autoscaling#pre-defined-labels-and-taints-on-nodes-scaled-from-zero)).
357+
358+
It should be adjusted to also consider the cofnigured taints.
359+
364360
### Security Model
365361

366362
Users who can define `Taint`s that get placed on `Node`s will be able to steer workloads, possibly to malicious hosts in order to extract sensitive data.
@@ -372,10 +368,10 @@ This proposal therefore does not present any heightened security requirements th
372368

373369
With the introduction of the new fields, bootstrap providers could start deprecating their equivalent fields.
374370

375-
For CABPK, adding a taint at a KubeadmConfigTemplate at `.spec.template.spec.joinConfiguration.nodeRegistration.taints`. is almost equivalent to adding a `Initialization` typed taint by the new API.
371+
For CABPK, adding a taint at a KubeadmConfigTemplate at `.spec.template.spec.joinConfiguration.nodeRegistration.taints`. is almost equivalent to adding a `OnInitialization` typed taint by the new API.
376372
The only difference is that the taint does get added by the controllers after the Node joined the Cluster.
377373

378-
This is considered okay, because workload should not be able to be schedule workload unless the `node.cluster.x-k8s.io/uninitialized` taint was removed and the implementation should take care to have added the `Initialization` taints before or at the time removing this taint.
374+
This is considered okay, because workload should not be able to be schedule workload unless the `node.cluster.x-k8s.io/uninitialized` taint was removed and the implementation should take care to have added the `OnInitialization` taints before or at the time removing this taint.
379375

380376
### Risks and Mitigations
381377

@@ -387,6 +383,8 @@ This risk can be mitigated by ensuring Cluster API only modifies taints that it
387383

388384
## Alternatives
389385

386+
### Continuous reconciliation
387+
390388
Deciding whether or not to reconcile taint changes continuously has been a challenge for the Cluster API.
391389
Historically, the v1alpha1 API included a `Machine.Taints` field.
392390
However, since this field was mostly used in cluster bootstrapping, it was later extracted into bootstrap provider implementations.
@@ -398,6 +396,19 @@ While this simplifies the implementation logic for Cluster API, it may be surpri
398396
This would also mean that there are two different behaviors when modifying metadata within Cluster API, which could again be very confusing.
399397
There is already precedent for leaving infrastructure in-place when Kubernetes-only fields are modified, and this proposal seeks to align with the established function.
400398

399+
### Type definition of a taint in Cluster API objects
400+
401+
An alternative to define the taints as writtetn above is using slices per type as follows:
402+
403+
```golang
404+
type MachineTaints struct{
405+
Always []corev1.Taint `json:"always,omitempty"`
406+
OnInitialization []corev1.Taint `json:"onInitialization,omitempty"`
407+
}
408+
```
409+
410+
However this was considered as not being as expressive and extensible as the above.
411+
401412
## Upgrade Strategy
402413

403414
Taint support is a net-new field, and therefore must be optional and not affect upgrades.
@@ -418,9 +429,10 @@ This proposal should be unaffected regardless of any upstream change to the hand
418429

419430
### Graduation Criteria [optional]
420431

421-
Open question: Should we feature-gate this new field?
432+
The feature should be implemented behind a feature gate.
433+
If deactivated, the corresponding controlers should block setting the fields via webhooks and not run the corresponding logic in the controllers.
422434

423-
Considering that "not setting the fields" should be equivalent to "disabling the feature gate" seems to be good enough.
435+
This allows to gain experience and the ability to do adjustments and graduate the feature over time.
424436

425437
<!--
426438
**Note:** *Section not required until targeted at a release.*

0 commit comments

Comments
 (0)