Skip to content

Commit 5928b37

Browse files
committed
docs(capm3): add Unhealthy Annotation section and update SUMMARY
Signed-off-by: Queensly Acheampongmaa <[email protected]>
1 parent be4c688 commit 5928b37

File tree

2 files changed

+59
-1
lines changed

2 files changed

+59
-1
lines changed

docs/user-guide/src/SUMMARY.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
- [RAID Setup](bmo/raid.md)
2424
- [Rebooting Hosts](bmo/reboot_annotation.md)
2525
- [Specifying Root Device](bmo/root_device_hints.md)
26-
- [Advanced Features](bmo/advanced-features.md)
26+
- [Advanced Features](bmo/advanced-features.md)
2727
- [Adopting Externally Provisioned Hosts](bmo/externally_provisioned.md)
2828
- [Advanced Instance Customization](bmo/advanced_instance_customization.md)
2929
- [Booting from Live ISO](bmo/live-iso.md)

docs/user-guide/src/capm3/introduction.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,64 @@ itself is shared across multiple cloud providers. Cluster API Provider Metal3 is
1616
one of the providers for Cluster API and enables users to deploy a Cluster API based
1717
cluster on top of bare metal infrastructure using Metal3.
1818

19+
20+
## Unhealthy Annotation
21+
22+
The `capi.metal3.io/unhealthy` annotation is used by Cluster API Provider Metal³ (CAPM3)
23+
to mark **BareMetalHost** objects that should not be selected for provisioning new
24+
`Metal3Machine` resources.
25+
26+
When this annotation is present, CAPM3 excludes the annotated host from consideration
27+
when matching available hardware to new Machines. This prevents the reuse of hosts that
28+
are known to be unhealthy or have failed remediation attempts.
29+
30+
## Manual usage
31+
32+
Operators can manually mark a host as unhealthy by adding the following annotation to a
33+
`BareMetalHost` object:
34+
35+
```yaml
36+
metadata:
37+
annotations:
38+
capi.metal3.io/unhealthy: "true"
39+
```
40+
41+
Removing the annotation re-enables the host for normal provisioning by CAPM3.
42+
43+
## Automatic application after remediation timeout
44+
45+
Starting from CAPM3 API version `v1alpha4` (available in previous release branches),
46+
this annotation may also be **applied automatically** when remediation attempts time
47+
out and the node fails to recover.
48+
49+
During a remediation cycle managed by a `Metal3Remediation` resource, the following
50+
parameters define retry and timeout behavior:
51+
52+
- `.spec.strategy.retryLimit` — the number of reboot retries permitted before the
53+
remediation is considered failed.
54+
- `.spec.strategy.timeout` — the duration to wait between retries for the node to
55+
become healthy.
56+
57+
If the final timeout expires and the node remains unhealthy:
58+
59+
1. CAPM3 sets the `MachineOwnerRemediatedCondition=False` condition on the affected
60+
`Machine` to begin deletion of the unhealthy `Machine` and related remediation
61+
objects.
62+
2. The corresponding `BareMetalHost` is automatically annotated with:
63+
64+
```yaml
65+
metadata:
66+
annotations:
67+
capi.metal3.io/unhealthy: "true"
68+
```
69+
70+
This automatic annotation ensures that CAPM3 does not immediately attempt to reuse the
71+
same physical host for another Machine after remediation failure. The host remains
72+
excluded from new provisioning until an operator manually removes the annotation after
73+
verifying and correcting the underlying issue.
74+
75+
Check the [Remediation](https://book.metal3.io/capm3/remediaton/) process fro more details.
76+
1977
## Compatibility with Cluster API
2078

2179
| CAPM3 version | Cluster API version | CAPM3 Release |

0 commit comments

Comments
 (0)