Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/user-guide/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
- [RAID Setup](bmo/raid.md)
- [Rebooting Hosts](bmo/reboot_annotation.md)
- [Specifying Root Device](bmo/root_device_hints.md)
- [Advanced Features](bmo/advanced-features.md)
- [Advanced Features](bmo/advanced-features.md)
- [Adopting Externally Provisioned Hosts](bmo/externally_provisioned.md)
- [Advanced Instance Customization](bmo/advanced_instance_customization.md)
- [Booting from Live ISO](bmo/live-iso.md)
Expand Down
58 changes: 58 additions & 0 deletions docs/user-guide/src/capm3/introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,64 @@ itself is shared across multiple cloud providers. Cluster API Provider Metal3 is
one of the providers for Cluster API and enables users to deploy a Cluster API based
cluster on top of bare metal infrastructure using Metal3.


## Unhealthy Annotation

The `capi.metal3.io/unhealthy` annotation is used by Cluster API Provider Metal³ (CAPM3)
to mark **BareMetalHost** objects that should not be selected for provisioning new
`Metal3Machine` resources.

When this annotation is present, CAPM3 excludes the annotated host from consideration
when matching available hardware to new Machines. This prevents the reuse of hosts that
are known to be unhealthy or have failed remediation attempts.

## Manual usage

Operators can manually mark a host as unhealthy by adding the following annotation to a
`BareMetalHost` object:

```yaml
metadata:
annotations:
capi.metal3.io/unhealthy: "true"
```

Removing the annotation re-enables the host for normal provisioning by CAPM3.

## Automatic application after remediation timeout

Starting from CAPM3 API version `v1alpha4` (available in previous release branches),
this annotation may also be **applied automatically** when remediation attempts time
out and the node fails to recover.

During a remediation cycle managed by a `Metal3Remediation` resource, the following
parameters define retry and timeout behavior:

- `.spec.strategy.retryLimit` — the number of reboot retries permitted before the
remediation is considered failed.
- `.spec.strategy.timeout` — the duration to wait between retries for the node to
become healthy.

If the final timeout expires and the node remains unhealthy:

1. CAPM3 sets the `MachineOwnerRemediatedCondition=False` condition on the affected
`Machine` to begin deletion of the unhealthy `Machine` and related remediation
objects.
2. The corresponding `BareMetalHost` is automatically annotated with:

```yaml
metadata:
annotations:
capi.metal3.io/unhealthy: "true"
```

This automatic annotation ensures that CAPM3 does not immediately attempt to reuse the
same physical host for another Machine after remediation failure. The host remains
excluded from new provisioning until an operator manually removes the annotation after
verifying and correcting the underlying issue.

Check the [Remediation](https://book.metal3.io/capm3/remediaton/) process fro more details.

## Compatibility with Cluster API

| CAPM3 version | Cluster API version | CAPM3 Release |
Expand Down