Skip to content

Commit be4c688

Browse files
Merge pull request #588 from Nordix/Sunnatillo/add-fd-doc
Add documentation for Failure Domain feature
2 parents 714e01f + 1ee60e7 commit be4c688

File tree

3 files changed

+64
-0
lines changed

3 files changed

+64
-0
lines changed

docs/user-guide/src/SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@
4747
- [Label synchronization](capm3/label_sync.md)
4848
- [Data sources](capm3/data_sources.md)
4949
- [ClusterClass](capm3/clusterclass.md)
50+
- [Failure Domain](capm3/failure_domain.md)
5051
- [Ip-address-manager](ipam/introduction.md)
5152
- [Install Ip-address-manager](ipam/ipam_installation.md)
5253
- [Troubleshooting FAQ](troubleshooting.md)
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# Failure Domains in Metal3
2+
3+
## What is Failure Domain?
4+
5+
Failure Domain: A topology label (e.g., row-a, rack-12) grouping hosts that
6+
share a common failure domains.
7+
8+
## Why Failure Domain?
9+
10+
Baremetal environments often have racks, rows, or sites with different network
11+
setups. The Failure Domain (FD) feature allows users to distribute
12+
control-plane nodes across these different locations for improved resilience
13+
and fault isolation.
14+
15+
Cluster API (CAPI) supports FD for control-plane nodes through the
16+
KubeadmControlPlane (KCP) controller. KCP reads the set of FDs from
17+
`ProviderCluster.Spec.FailureDomains`. If defined, these values are copied to
18+
`Cluster.Status.FailureDomains`. KCP then selects an FD from this set
19+
and places its value in `Machine.Spec.FailureDomain`. CAPM3 machine controller
20+
reads `Machine.Spec.FailureDomain` and sets to
21+
`metal3Machine.Spec.FailureDomain`. By default, KCP attempts to balance Control
22+
Plane Machines evenly across all defined FDs.
23+
24+
## How to use?
25+
26+
In public clouds, FDs are pre-defined. But in Metal3, users need to manually
27+
define and assign FDs to BareMetalHosts.
28+
29+
1. Label BareMetalHosts with their FD:
30+
31+
```yaml
32+
metadata:
33+
labels:
34+
infrastructure.cluster.x-k8s.io/failure-domain: rack-2
35+
```
36+
37+
1. Define these FDs in the Metal3Cluster specification:
38+
39+
```yaml
40+
kind: Metal3Cluster
41+
spec:
42+
failureDomains:
43+
my-fd-1:
44+
controlPlane: true
45+
attributes:
46+
datacenter: hki-dc1
47+
row: A
48+
rack: 1
49+
powerFeed: pf-1a
50+
my-fd-2:
51+
controlPlane: true
52+
attributes:
53+
switch: 10Gbps
54+
```
55+
56+
CAPM3 checks the `Metal3Machine.Spec.FailureDomain` field. If it is set, CAPM3
57+
tries to associate a BMH from the specified FD. If no BMH is
58+
available in that domain, it will select another available host in any
59+
other FD.
60+
61+
**Note:** User can populate FD labels to kubernetes node level using [label
62+
synchronization feature](./label_sync.md).

docs/user-guide/src/capm3/features.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,4 @@
77
- [Label synchronization](./label_sync.md)
88
- [Data sources](./data_sources.md)
99
- [ClusterClass](./clusterclass.md)
10+
- [Failure domain](./failure_domain.md)

0 commit comments

Comments
 (0)