Skip to content

Commit 6276cd3

Browse files
committed
DHCP-less proposal
Draft proposal to explore how we can enable the CAPM3 IPAM flow in a DHCP-less environment
1 parent edef5b1 commit 6276cd3

File tree

1 file changed

+391
-0
lines changed

1 file changed

+391
-0
lines changed

design/dhcp-less.md

Lines changed: 391 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,391 @@
1+
<!--
2+
This work is licensed under a Creative Commons Attribution 3.0
3+
Unported License.
4+
5+
http://creativecommons.org/licenses/by/3.0/legalcode
6+
-->
7+
8+
# DHCP-less
9+
10+
Discuss options and outline a proposal to enable DHCP-less leveraging existing
11+
Ironic support for this functionality, without dependencies on downstream customizations.
12+
13+
## Status
14+
15+
implementable
16+
17+
## Summary
18+
19+
Metal<sup>3</sup> provides an IPAM controller which can be used to enable
20+
deployment with static-IPs instead of DHCP, however currently it is not
21+
possible to use this functionality in a fully DHCP-less environement without
22+
downstream customizations.
23+
24+
This proposal outlines the outstanding issues, and potential solutions to
25+
enable an improved DHCP-less solution for Metal<sup>3</sup> users.
26+
27+
## Motivation
28+
29+
Infrastructure management via Metal<sup>3</sup> in DHCP-less environments
30+
is common, but today our upstream features only partially solve for this use-case.
31+
32+
Since there are several groups in the community who require this functionality,
33+
it makes sense to collaborate and ensure we can use upstream components where
34+
possible and only depend on downstream customizations where absolutely required.
35+
36+
### Goals
37+
38+
Provide a method to support DHCP-less deployments without any downstream
39+
customizations (except perhaps a different IPA ramdisk image?).
40+
41+
Enable e2e integration of the CAPM3 IPAM components such that it's possible
42+
to deploy in a DHCP-less environment using static network confgiguration
43+
managed via Metal<sup>3</sup> resources.
44+
45+
### Non-Goals
46+
47+
Existing methods used to configure networking via downstream customizations (such
48+
as a custom PreprovisioningImageController) are valid and will still sometimes
49+
be required, this doesn't aim to replace such methods - the approach here may be
50+
complimentary for those users wishing to combine CAPM3 IPAM features with
51+
a PreprovisioningImageController.
52+
53+
This proposal will focus on the Metal<sup>3</sup> components only - there are
54+
also OS dependencies and potential related areas of work in Ironic, these will
55+
be mentioned in the Dependencies section but not covered in detail here.
56+
57+
This proposal will only consider the Metal<sup>3</sup> IPAM controller -
58+
there are other options but none are currently integrated via CAPM3.
59+
60+
## Proposal
61+
62+
Implement a new CAPM3 controller to handle setting the BareMetalHost `preProvisioningNetworkDataName`
63+
in an automated way via existing Metal<sup>3</sup> IPAM resources.
64+
65+
### User Stories
66+
67+
#### Static network configuration (no IPAM)
68+
69+
As a user I want to manage my networkConfiguration statically as part of my
70+
BareMetalHost inventory.
71+
72+
In this case the network configuration is provided via a Secret which is
73+
either manually created or templated outside the scope of Metal<sup>3</sup>
74+
75+
The BareMetalHost API already supports two interfaces for passing network configuration:
76+
77+
* `networkData` - this data is passed to the deployed OS via Ironic via a
78+
configuration drive partition. It is then typically read on firstboot by
79+
a tool such as `cloud-init` which supports the OpenStack network data format.
80+
* `preprovisioningNetworkDataName` - this data is designed to allow passing data
81+
during the preprovisioning phase, e.g to configure networking for the IPA deploy
82+
ramdisk.
83+
84+
The `preprovisioningNetworkDataName` API was added initially to enable [image
85+
building workflows](https://github.com/metal3-io/baremetal-operator/blob/main/docs/api.md#preprovisioningimage), and a [recent BMO change](https://github.com/metal3-io/baremetal-operator/pull/1380) landed to enable this flow without any custom PreprovisioningImage controller.
86+
87+
#### IPAM configuration
88+
89+
As a user I wish to make use use of the Metal<sup>3</sup> IPAM solution, in a
90+
DHCP-less environment.
91+
92+
Metal<sup>3</sup> provides an [IPAM controller](https://github.com/metal3-io/ip-address-manager)
93+
which can be used to allocate IPs used as part of the Metal3Machine lifecycle.
94+
95+
Some gaps exist which prevent realizing this flow in a fully DHCP-less environment,
96+
so the main focus of the proposal will be how to solve for this use-case.
97+
98+
##### IPAM Scenario 1 - common IPPool
99+
100+
An environment where a common configuration is desired for the pre-provisionining
101+
phase and the provisioned BareMetalHost (e.g scenario where hosts are permanentaly
102+
assigned to specific clusters)
103+
104+
##### IPAM Scenario 2 - decoupled preprovisioning/provisioning IPPool
105+
106+
An environment where a decoupled configuration is desired for the pre-provisionining
107+
phase and the provisioned BareMetalHost (e.g BMaaS scenario where end-user network configuration
108+
differs from the commissioning phase where a different configuration is desired for inspection/cleaning)
109+
110+
## Design Details
111+
112+
`Metal3MachineTemplate` and `Metal3DataTemplate` are used to apply networkData to specific BareMetalHost resources,
113+
but they are by design coupled to the CAPI Machine lifecycle.
114+
115+
This is a problem for the pre-provisioning use-case since at this point we're preparing the BareMetalHost for
116+
use, there is not yet any Machine.
117+
118+
To resolve this below we outline a proposal to add two new resources with similar behavior for the pre-provisioning
119+
phase `Metal3PreProvisioningTemplate` and `Metal3PreProvisioningDataTemplate`
120+
121+
### API overview
122+
123+
The current flow in the provisioning phase is as follows (only the most relevant fields are included for clarity):
124+
125+
```yaml
126+
apiVersion: ipam.metal3.io/v1alpha1
127+
kind: IPPool
128+
metadata:
129+
name: pool-1
130+
spec:
131+
clusterName: cluster
132+
133+
---
134+
135+
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
136+
kind: Metal3DataTemplate
137+
metadata:
138+
name: data-template
139+
spec:
140+
clusterName: cluster
141+
networkData:
142+
networks:
143+
ipv4:
144+
- id: eth0
145+
ipAddressFromIPPool: pool-1
146+
147+
---
148+
149+
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
150+
kind: Metal3MachineTemplate
151+
metadata:
152+
name: machine-template
153+
spec:
154+
template:
155+
spec:
156+
dataTemplate:
157+
name: data-template
158+
hostSelector:
159+
matchLabels:
160+
cluster-role: control-plane
161+
162+
---
163+
apiVersion: cluster.x-k8s.io/v1beta1
164+
kind: MachineDeployment
165+
metadata:
166+
name: machine-deployment
167+
spec:
168+
clusterName: cluster
169+
replicas: 1
170+
template:
171+
spec:
172+
clusterName: cluster
173+
infrastructureRef:
174+
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
175+
kind: Metal3MachineTemplate
176+
name: machine-template
177+
```
178+
179+
In this flow when a Metal3Machine is provisioned via the `MachineDeployment`, BareMetalHost resources labeled
180+
`cluster-role: control-plane` will have `networkData` defined with an IP derived from the `pool-1` `IPPool`.
181+
182+
In CAPM3 an IPClaim is created to reserve and IP from the IPPool for each Machine, and an IPAddress resource
183+
contains the data used for templating of the `networkData`
184+
185+
#### Preprovisioning - Common IPPool
186+
187+
```yaml
188+
apiVersion: ipam.metal3.io/v1alpha1
189+
kind: IPPool
190+
metadata:
191+
name: pool-1
192+
spec:
193+
clusterName: cluster
194+
195+
---
196+
197+
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
198+
kind: Metal3PreprovisioningDataTemplate
199+
metadata:
200+
name: preprov-data-template
201+
spec:
202+
preprovisioningNetworkData:
203+
networks:
204+
ipv4:
205+
- id: eth0
206+
ipAddressFromIPPool: pool-1
207+
208+
---
209+
210+
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
211+
kind: Metal3PreprovisioningTemplate
212+
metadata:
213+
name: preprov-template
214+
spec:
215+
template:
216+
spec:
217+
dataTemplate:
218+
name: preprov-data-template
219+
hostSelector:
220+
matchLabels:
221+
pre-provisioning: foo
222+
```
223+
224+
In this flow there is no `MachineDeployment`, BareMetalHost resources labeled to match the
225+
preprov-template hostSelector will have preprovisioningNetworkDataName assigned using the same process
226+
outlined above for `networkData` above.
227+
228+
There are a few things to consider:
229+
230+
To avoid the risk of multiple Metal3PreprovisioningTemplate resources matching BareMetalHost resources (which would be ambiguous)
231+
a BMH must match *exactly one* Metal3PreprovisioningTemplate for the conroller to take action, if more than one matches it will be
232+
reflected as ignored via the Metal3PreprovisioningTemplate status.
233+
234+
The preprovisioningNetworkDataName is used by default for networkData in the baremetal-operator, so in this configuration it's not
235+
strictly necessary to specify networkData via Metal3DataTemplate, however we'll want to delete the IPClaim after preprovisioning
236+
in the decoupled flow below so it seems likely we'll want to behave consistently and rely on the IP Reuse functionality if a
237+
consistent IP is required between pre-provisioning and provisioning phases.
238+
239+
#### Preprovisioning Decoupled IPPool
240+
241+
```yaml
242+
apiVersion: ipam.metal3.io/v1alpha1
243+
kind: IPPool
244+
metadata:
245+
name: pool-1
246+
spec:
247+
clusterName: cluster
248+
249+
---
250+
251+
apiVersion: ipam.metal3.io/v1alpha1
252+
kind: IPPool
253+
metadata:
254+
name: preprovisioning-pool
255+
256+
---
257+
258+
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
259+
kind: Metal3PreprovisioningDataTemplate
260+
metadata:
261+
name: preprov-data-template
262+
spec:
263+
preprovisioningNetworkData:
264+
networks:
265+
ipv4:
266+
- id: eth0
267+
ipAddressFromIPPool: preprovisioning-pool
268+
269+
---
270+
271+
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
272+
kind: Metal3PreprovisioningTemplate
273+
metadata:
274+
name: preprov-template
275+
spec:
276+
template:
277+
spec:
278+
dataTemplate:
279+
name: preprov-data-template
280+
hostSelector:
281+
matchLabels:
282+
pre-provisioning: foo
283+
284+
---
285+
286+
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
287+
kind: Metal3DataTemplate
288+
metadata:
289+
name: data-template
290+
spec:
291+
clusterName: cluster
292+
networkData:
293+
networks:
294+
ipv4:
295+
- id: eth0
296+
ipAddressFromIPPool: pool-1
297+
298+
---
299+
300+
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
301+
kind: Metal3MachineTemplate
302+
metadata:
303+
name: machine-template
304+
spec:
305+
template:
306+
spec:
307+
dataTemplate:
308+
name: data-template
309+
hostSelector:
310+
matchLabels:
311+
cluster-role: control-plane
312+
313+
```
314+
315+
In this flow we have `preprovisioning-pool` which is not associated with any cluster, this is used to provide an IPAddress during
316+
the pre-provisioning phase as described above. To reduce the required size of the pool, the IPClaim will be deleted after the
317+
preprovisioning phase is completed, e.g the BMH resource becomes available.
318+
319+
In the provisioning phase another pool, associated with a cluster is used to template networkData as in the existing process.
320+
321+
#### Assumptions and Open Questions
322+
323+
TODO
324+
325+
### Inspection on initial registration
326+
327+
On initial registration of a host, inspection is triggered immediately but this process cannot complete without preprovisioning network configuration in a DHCP-less environment (because the IPA ramdisk can't connect back to the Ironic API).
328+
329+
This can be resolved if the BareMetalHost resources are created with the existing [paused annotation](https://github.com/metal3-io/baremetal-operator/blob/main/docs/api.md#pausing-reconciliation), set to a pre-determined value (e.g `metal3.io/preprovisioning`) which can then be removed by the new controller after `preprovisioningNetworkDataName` has been set, then inspection will be able to succeed.
330+
331+
### Implementation Details/Notes/Constraints
332+
333+
#### IP Reuse
334+
335+
A related issue has been previously addressed via the [IP Reuse](https://github.com/metal3-io/cluster-api-provider-metal3/blob/main/docs/ip_reuse.md) functionality - this means we can couple IPClaims to the BareMetalHost resources which will enable consistent IP allocations for pre-provisioning and subsequent provisioning operations (provided the same IPPool is used for both steps)
336+
337+
### Risks and Mitigations
338+
339+
- TODO
340+
341+
### Work Items
342+
343+
TODO
344+
345+
### Dependencies
346+
347+
#### Firstboot agent support
348+
349+
An agent in the IPA ramdisk image is required to consume the network data provided via the processes outlined above.
350+
351+
The Ironic DHCP-less documentation describes using glean (a minimal python-based cloud-init alternative), but we don't
352+
currently have any community-supported IPA ramdisk image containing this tool.
353+
354+
#### Potential config-drive conflict on redeployment
355+
356+
357+
### Test Plan
358+
359+
TODO
360+
361+
### Upgrade / Downgrade Strategy
362+
363+
TODO
364+
365+
### Version Skew Strategy
366+
367+
N/A
368+
369+
## Drawbacks
370+
371+
TODO
372+
373+
## Alternatives
374+
375+
376+
### Kanod
377+
378+
One possibility is to manage the lifecycle of `preprovisioningNetworkDataName` outside of
379+
the Metal<sup>3</sup> core components - such an approach has been successfully demonstrated
380+
in the [Kanod community](https://gitlab.com/Orange-OpenSource/kanod/) which is related to
381+
the [Sylva](https://sylvaproject.org) project.
382+
383+
The design proposal here has been directly inspired by this work, but I think directly integrating
384+
this functionality into CAPM3 has the following advantages:
385+
386+
* We can close a functional gap which potentially impacts many Metal<sup>3</sup> users, not only those involved with Kanod/Sylva
387+
* Directly integrating into CAPM3 means we can use a common approach for `networkData` and `preprovisioningNetworkData`
388+
389+
## References
390+
391+
TODO

0 commit comments

Comments
 (0)