Skip to content

Commit 811245f

Browse files
committed
DHCP-less IPAM proposal
Draft proposal to explore how we can enable the CAPM3 IPAM flow in a DHCP-less environment
1 parent b4a9869 commit 811245f

File tree

1 file changed

+397
-0
lines changed

1 file changed

+397
-0
lines changed
Lines changed: 397 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,397 @@
1+
<!--
2+
This work is licensed under a Creative Commons Attribution 3.0
3+
Unported License.
4+
5+
http://creativecommons.org/licenses/by/3.0/legalcode
6+
-->
7+
8+
# DHCP-less network config templating
9+
10+
Discuss options and outline a proposal to enable DHCP-less network config templating,
11+
leveraging existing CAPM3 IPAM support.
12+
13+
## Status
14+
15+
implementable
16+
17+
## Summary
18+
19+
Metal<sup>3</sup> provides an IPAM controller which can be used to enable
20+
deployment with static-IPs instead of DHCP, however currently it is not
21+
possible to use this functionality in a fully DHCP-less environment because
22+
it does not support network configuration in the pre-provisioning phase.
23+
24+
This proposal outlines some additional network config templating to enable
25+
use of the existing IPAM solution for the pre-provisioning phase, using
26+
a similar approach to the existing templating of `networkData`
27+
28+
## Motivation
29+
30+
Infrastructure management via Metal<sup>3</sup> in DHCP-less environments
31+
is common, but today our upstream features only partially solve for this use-case.
32+
33+
Since there are several groups in the community who require this functionality,
34+
it makes sense to collaborate and ensure we can support this use-case.
35+
36+
### Goals
37+
38+
Enable e2e integration of the CAPM3 IPAM components such that it's possible
39+
to deploy in a DHCP-less environment using static network confgiguration
40+
managed via Metal<sup>3</sup> resources.
41+
42+
### Non-Goals
43+
44+
Existing methods used to configure networking via downstream customizations (such
45+
as a custom PreprovisioningImageController) are valid and will still sometimes
46+
be required, this doesn't aim to replace such methods - the approach here may be
47+
complimentary for those users wishing to combine CAPM3 IPAM features with
48+
a PreprovisioningImageController.
49+
50+
This proposal will focus on the Metal<sup>3</sup> components only - there are
51+
also OS dependencies and potential related areas of work in Ironic, these will
52+
be mentioned in the Dependencies section but not covered in detail here.
53+
54+
This proposal will only consider the Metal<sup>3</sup> IPAM controller -
55+
there are other options but none are currently integrated via CAPM3.
56+
57+
## Proposal
58+
59+
Implement a new CAPM3 controller to handle setting the BareMetalHost `preProvisioningNetworkDataName`
60+
in an automated way via existing Metal<sup>3</sup> IPAM resources.
61+
62+
This will be achieved via an approach similar to the existing templating of `networkData`
63+
but adjusted to account for the lack of any `Machine` at the pre-provisioning step
64+
of the deployment flow.
65+
66+
### User Stories
67+
68+
#### Static network configuration (no IPAM)
69+
70+
As a user I want to manage my networkConfiguration statically as part of my
71+
BareMetalHost inventory.
72+
73+
In this case the network configuration is provided via a Secret which is
74+
either manually created or templated outside the scope of Metal<sup>3</sup>
75+
76+
The BareMetalHost API already supports two interfaces for passing network configuration:
77+
78+
* `networkData` - this data is passed to the deployed OS via Ironic via a
79+
configuration drive partition. It is then typically read on firstboot by
80+
a tool such as `cloud-init` which supports the OpenStack network data format.
81+
* `preprovisioningNetworkDataName` - this data is designed to allow passing data
82+
during the preprovisioning phase, e.g to configure networking for the IPA deploy
83+
ramdisk.
84+
85+
The `preprovisioningNetworkDataName` API was added initially to enable [image
86+
building workflows](https://github.com/metal3-io/baremetal-operator/blob/main/docs/api.md#preprovisioningimage), and a [recent BMO change](https://github.com/metal3-io/baremetal-operator/pull/1380) landed to enable this flow without any custom PreprovisioningImage controller.
87+
88+
#### IPAM configuration
89+
90+
As a user I wish to make use use of the Metal<sup>3</sup> IPAM solution, in a
91+
DHCP-less environment.
92+
93+
Metal<sup>3</sup> provides an [IPAM controller](https://github.com/metal3-io/ip-address-manager)
94+
which can be used to allocate IPs used as part of the Metal3Machine lifecycle.
95+
96+
Some gaps exist which prevent realizing this flow in a fully DHCP-less environment,
97+
so the main focus of the proposal will be how to solve for this use-case.
98+
99+
##### IPAM Scenario 1 - common IPPool
100+
101+
An environment where a common configuration is desired for the pre-provisionining
102+
phase and the provisioned BareMetalHost (e.g scenario where hosts are permanentaly
103+
assigned to specific clusters)
104+
105+
##### IPAM Scenario 2 - decoupled preprovisioning/provisioning IPPool
106+
107+
An environment where a decoupled configuration is desired for the pre-provisionining
108+
phase and the provisioned BareMetalHost (e.g BMaaS scenario where end-user network configuration
109+
differs from the commissioning phase where a different configuration is desired for inspection/cleaning)
110+
111+
## Design Details
112+
113+
`Metal3MachineTemplate` and `Metal3DataTemplate` are used to apply networkData to specific BareMetalHost resources,
114+
but they are by design coupled to the CAPI Machine lifecycle.
115+
116+
This is a problem for the pre-provisioning use-case since at this point we're preparing the BareMetalHost for
117+
use, there is not yet any Machine.
118+
119+
To resolve this below we outline a proposal to add two new resources with similar behavior for the pre-provisioning
120+
phase `Metal3PreProvisioningTemplate` and `Metal3PreProvisioningDataTemplate`
121+
122+
### API overview
123+
124+
The current flow in the provisioning phase is as follows (only the most relevant fields are included for clarity):
125+
126+
```yaml
127+
apiVersion: ipam.metal3.io/v1alpha1
128+
kind: IPPool
129+
metadata:
130+
name: pool-1
131+
spec:
132+
clusterName: cluster
133+
134+
---
135+
136+
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
137+
kind: Metal3DataTemplate
138+
metadata:
139+
name: data-template
140+
spec:
141+
clusterName: cluster
142+
networkData:
143+
networks:
144+
ipv4:
145+
- id: eth0
146+
ipAddressFromIPPool: pool-1
147+
148+
---
149+
150+
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
151+
kind: Metal3MachineTemplate
152+
metadata:
153+
name: machine-template
154+
spec:
155+
template:
156+
spec:
157+
dataTemplate:
158+
name: data-template
159+
hostSelector:
160+
matchLabels:
161+
cluster-role: control-plane
162+
163+
---
164+
apiVersion: cluster.x-k8s.io/v1beta1
165+
kind: MachineDeployment
166+
metadata:
167+
name: machine-deployment
168+
spec:
169+
clusterName: cluster
170+
replicas: 1
171+
template:
172+
spec:
173+
clusterName: cluster
174+
infrastructureRef:
175+
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
176+
kind: Metal3MachineTemplate
177+
name: machine-template
178+
```
179+
180+
In this flow when a Metal3Machine is provisioned via the `MachineDeployment`, BareMetalHost resources labeled
181+
`cluster-role: control-plane` will have `networkData` defined with an IP derived from the `pool-1` `IPPool`.
182+
183+
In CAPM3 an IPClaim is created to reserve and IP from the IPPool for each Machine, and an IPAddress resource
184+
contains the data used for templating of the `networkData`
185+
186+
#### Preprovisioning - Common IPPool
187+
188+
```yaml
189+
apiVersion: ipam.metal3.io/v1alpha1
190+
kind: IPPool
191+
metadata:
192+
name: pool-1
193+
spec:
194+
clusterName: cluster
195+
196+
---
197+
198+
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
199+
kind: Metal3PreprovisioningDataTemplate
200+
metadata:
201+
name: preprov-data-template
202+
spec:
203+
preprovisioningNetworkData:
204+
networks:
205+
ipv4:
206+
- id: eth0
207+
ipAddressFromIPPool: pool-1
208+
209+
---
210+
211+
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
212+
kind: Metal3PreprovisioningTemplate
213+
metadata:
214+
name: preprov-template
215+
spec:
216+
template:
217+
spec:
218+
dataTemplate:
219+
name: preprov-data-template
220+
hostSelector:
221+
matchLabels:
222+
pre-provisioning: foo
223+
```
224+
225+
In this flow there is no `MachineDeployment`, BareMetalHost resources labeled to match the
226+
preprov-template hostSelector will have preprovisioningNetworkDataName assigned using the same process
227+
outlined above for `networkData` above.
228+
229+
There are a few things to consider:
230+
231+
To avoid the risk of multiple Metal3PreprovisioningTemplate resources matching BareMetalHost resources (which would be ambiguous)
232+
a BMH must match *exactly one* Metal3PreprovisioningTemplate for the conroller to take action, if more than one matches it will be
233+
reflected as ignored via the Metal3PreprovisioningTemplate status.
234+
235+
The preprovisioningNetworkDataName is used by default for networkData in the baremetal-operator, so in this configuration it's not
236+
strictly necessary to specify networkData via Metal3DataTemplate, however we'll want to delete the IPClaim after preprovisioning
237+
in the decoupled flow below so it seems likely we'll want to behave consistently and rely on the IP Reuse functionality if a
238+
consistent IP is required between pre-provisioning and provisioning phases.
239+
240+
#### Preprovisioning Decoupled IPPool
241+
242+
```yaml
243+
apiVersion: ipam.metal3.io/v1alpha1
244+
kind: IPPool
245+
metadata:
246+
name: pool-1
247+
spec:
248+
clusterName: cluster
249+
250+
---
251+
252+
apiVersion: ipam.metal3.io/v1alpha1
253+
kind: IPPool
254+
metadata:
255+
name: preprovisioning-pool
256+
257+
---
258+
259+
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
260+
kind: Metal3PreprovisioningDataTemplate
261+
metadata:
262+
name: preprov-data-template
263+
spec:
264+
preprovisioningNetworkData:
265+
networks:
266+
ipv4:
267+
- id: eth0
268+
ipAddressFromIPPool: preprovisioning-pool
269+
270+
---
271+
272+
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
273+
kind: Metal3PreprovisioningTemplate
274+
metadata:
275+
name: preprov-template
276+
spec:
277+
template:
278+
spec:
279+
dataTemplate:
280+
name: preprov-data-template
281+
hostSelector:
282+
matchLabels:
283+
pre-provisioning: foo
284+
285+
---
286+
287+
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
288+
kind: Metal3DataTemplate
289+
metadata:
290+
name: data-template
291+
spec:
292+
clusterName: cluster
293+
networkData:
294+
networks:
295+
ipv4:
296+
- id: eth0
297+
ipAddressFromIPPool: pool-1
298+
299+
---
300+
301+
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
302+
kind: Metal3MachineTemplate
303+
metadata:
304+
name: machine-template
305+
spec:
306+
template:
307+
spec:
308+
dataTemplate:
309+
name: data-template
310+
hostSelector:
311+
matchLabels:
312+
cluster-role: control-plane
313+
314+
```
315+
316+
In this flow we have `preprovisioning-pool` which is not associated with any cluster, this is used to provide an IPAddress during
317+
the pre-provisioning phase as described above. To reduce the required size of the pool, the IPClaim will be deleted after the
318+
preprovisioning phase is completed, e.g the BMH resource becomes available.
319+
320+
In the provisioning phase another pool, associated with a cluster is used to template networkData as in the existing process.
321+
322+
#### Assumptions and Open Questions
323+
324+
TODO
325+
326+
### Inspection on initial registration
327+
328+
On initial registration of a host, inspection is triggered immediately but this process cannot complete without preprovisioning network configuration in a DHCP-less environment (because the IPA ramdisk can't connect back to the Ironic API).
329+
330+
To resolve this we can add a new BareMetalHost API `PreprovisioningNetworkDataRequired` which defaults to false, but when set to true will describe that the host cannot move from Registering -> Inspecting until `preprovisioningNetworkDataName` has been set.
331+
332+
An alternative could be to require that the BareMetalHost resources are created with the existing [paused annotation](https://github.com/metal3-io/baremetal-operator/blob/main/docs/api.md#pausing-reconciliation), set to a pre-determined value (e.g `metal3.io/preprovisioning`) which can then be removed by the new controller after `preprovisioningNetworkDataName` has been set, then inspection will be able to succeed.
333+
334+
### Implementation Details/Notes/Constraints
335+
336+
#### IP Reuse
337+
338+
A related issue has been previously addressed via the [IP Reuse](https://github.com/metal3-io/cluster-api-provider-metal3/blob/main/docs/ip_reuse.md) functionality - this means we can couple IPClaims to the BareMetalHost resources which will enable consistent IP allocations for pre-provisioning and subsequent provisioning operations (provided the same IPPool is used for both steps)
339+
340+
### Risks and Mitigations
341+
342+
- TODO
343+
344+
### Work Items
345+
346+
TODO
347+
348+
### Dependencies
349+
350+
#### Firstboot agent support
351+
352+
An agent in the IPA ramdisk image is required to consume the network data provided via the processes outlined above.
353+
354+
The Ironic DHCP-less documentation describes using glean (a minimal python-based cloud-init alternative), but we don't
355+
currently have any community-supported IPA ramdisk image containing this tool.
356+
357+
There are several other options such as cloud-init, or even custom scripts/tooling which may be coupled to the OS, so we
358+
do not define a specific solution as part of this proposal.
359+
360+
#### Potential config-drive conflict on redeployment
361+
362+
363+
### Test Plan
364+
365+
TODO
366+
367+
### Upgrade / Downgrade Strategy
368+
369+
TODO
370+
371+
### Version Skew Strategy
372+
373+
N/A
374+
375+
## Drawbacks
376+
377+
TODO
378+
379+
## Alternatives
380+
381+
382+
### Kanod
383+
384+
One possibility is to manage the lifecycle of `preprovisioningNetworkDataName` outside of
385+
the Metal<sup>3</sup> core components - such an approach has been successfully demonstrated
386+
in the [Kanod community](https://gitlab.com/Orange-OpenSource/kanod/) which is related to
387+
the [Sylva](https://sylvaproject.org) project.
388+
389+
The design proposal here has been directly inspired by this work, but I think directly integrating
390+
this functionality into CAPM3 has the following advantages:
391+
392+
* We can close a functional gap which potentially impacts many Metal<sup>3</sup> users, not only those involved with Kanod/Sylva
393+
* Directly integrating into CAPM3 means we can use a common approach for `networkData` and `preprovisioningNetworkData`
394+
395+
## References
396+
397+
TODO

0 commit comments

Comments
 (0)