@@ -16,6 +16,64 @@ itself is shared across multiple cloud providers. Cluster API Provider Metal3 is
1616one of the providers for Cluster API and enables users to deploy a Cluster API based
1717cluster on top of bare metal infrastructure using Metal3.
1818
19+
20+ ## Unhealthy Annotation
21+
22+ The ` capi.metal3.io/unhealthy ` annotation is used by Cluster API Provider Metal³ (CAPM3)
23+ to mark ** BareMetalHost** objects that should not be selected for provisioning new
24+ ` Metal3Machine ` resources.
25+
26+ When this annotation is present, CAPM3 excludes the annotated host from consideration
27+ when matching available hardware to new Machines. This prevents the reuse of hosts that
28+ are known to be unhealthy or have failed remediation attempts.
29+
30+ ## Manual usage
31+
32+ Operators can manually mark a host as unhealthy by adding the following annotation to a
33+ ` BareMetalHost ` object:
34+
35+ ``` yaml
36+ metadata :
37+ annotations :
38+ capi.metal3.io/unhealthy : " true"
39+ ` ` `
40+
41+ Removing the annotation re-enables the host for normal provisioning by CAPM3.
42+
43+ ## Automatic application after remediation timeout
44+
45+ Starting from CAPM3 API version ` v1alpha4` (available in previous release branches),
46+ this annotation may also be **applied automatically** when remediation attempts time
47+ out and the node fails to recover.
48+
49+ During a remediation cycle managed by a `Metal3Remediation` resource, the following
50+ parameters define retry and timeout behavior :
51+
52+ - ` .spec.strategy.retryLimit` — the number of reboot retries permitted before the
53+ remediation is considered failed.
54+ - ` .spec.strategy.timeout` — the duration to wait between retries for the node to
55+ become healthy.
56+
57+ If the final timeout expires and the node remains unhealthy :
58+
59+ 1. CAPM3 sets the `MachineOwnerRemediatedCondition=False` condition on the affected
60+ ` Machine` to begin deletion of the unhealthy `Machine` and related remediation
61+ objects.
62+ 2. The corresponding `BareMetalHost` is automatically annotated with :
63+
64+ ` ` ` yaml
65+ metadata:
66+ annotations:
67+ capi.metal3.io/unhealthy: "true"
68+ ` ` `
69+
70+ This automatic annotation ensures that CAPM3 does not immediately attempt to reuse the
71+ same physical host for another Machine after remediation failure. The host remains
72+ excluded from new provisioning until an operator manually removes the annotation after
73+ verifying and correcting the underlying issue.
74+
75+ Check the [Remediation](https://book.metal3.io/capm3/remediaton/) process fro more details.
76+
1977# # Compatibility with Cluster API
2078
2179| CAPM3 version | Cluster API version | CAPM3 Release |
0 commit comments