|
| 1 | +<!-- |
| 2 | + This work is licensed under a Creative Commons Attribution 3.0 |
| 3 | + Unported License. |
| 4 | +
|
| 5 | + http://creativecommons.org/licenses/by/3.0/legalcode |
| 6 | +--> |
| 7 | + |
| 8 | +# DHCP-less network config templating |
| 9 | + |
| 10 | +Discuss options and outline a proposal to enable DHCP-less network config templating, |
| 11 | +leveraging existing CAPM3 IPAM support. |
| 12 | + |
| 13 | +## Status |
| 14 | + |
| 15 | +implementable |
| 16 | + |
| 17 | +## Summary |
| 18 | + |
| 19 | +Metal<sup>3</sup> provides an IPAM controller which can be used to enable |
| 20 | +deployment with static-IPs instead of DHCP, however currently it is not |
| 21 | +possible to use this functionality in a fully DHCP-less environment because |
| 22 | +it does not support network configuration in the pre-provisioning phase. |
| 23 | + |
| 24 | +This proposal outlines some additional network config templating to enable |
| 25 | +use of the existing IPAM solution for the pre-provisioning phase, using |
| 26 | +a similar approach to the existing templating of `networkData` |
| 27 | + |
| 28 | +## Motivation |
| 29 | + |
| 30 | +Infrastructure management via Metal<sup>3</sup> in DHCP-less environments |
| 31 | +is common, but today our upstream features only partially solve for this use-case. |
| 32 | + |
| 33 | +Since there are several groups in the community who require this functionality, |
| 34 | +it makes sense to collaborate and ensure we can support this use-case. |
| 35 | + |
| 36 | +### Goals |
| 37 | + |
| 38 | +Enable e2e integration of the CAPM3 IPAM components such that it's possible |
| 39 | +to deploy in a DHCP-less environment using static network confgiguration |
| 40 | +managed via Metal<sup>3</sup> resources. |
| 41 | + |
| 42 | +### Non-Goals |
| 43 | + |
| 44 | +Existing methods used to configure networking via downstream customizations (such |
| 45 | +as a custom PreprovisioningImageController) are valid and will still sometimes |
| 46 | +be required, this doesn't aim to replace such methods - the approach here may be |
| 47 | +complimentary for those users wishing to combine CAPM3 IPAM features with |
| 48 | +a PreprovisioningImageController. |
| 49 | + |
| 50 | +This proposal will focus on the Metal<sup>3</sup> components only - there are |
| 51 | +also OS dependencies and potential related areas of work in Ironic, these will |
| 52 | +be mentioned in the Dependencies section but not covered in detail here. |
| 53 | + |
| 54 | +This proposal will only consider the Metal<sup>3</sup> IPAM controller - |
| 55 | +there are other options but none are currently integrated via CAPM3. |
| 56 | + |
| 57 | +## Proposal |
| 58 | + |
| 59 | +Implement a new CAPM3 controller to handle setting the BareMetalHost `preProvisioningNetworkDataName` |
| 60 | +in an automated way via existing Metal<sup>3</sup> IPAM resources. |
| 61 | + |
| 62 | +This will be achieved via an approach similar to the existing templating of `networkData` |
| 63 | +but adjusted to account for the lack of any `Machine` at the pre-provisioning step |
| 64 | +of the deployment flow. |
| 65 | + |
| 66 | +### User Stories |
| 67 | + |
| 68 | +#### Static network configuration (no IPAM) |
| 69 | + |
| 70 | +As a user I want to manage my networkConfiguration statically as part of my |
| 71 | +BareMetalHost inventory. |
| 72 | + |
| 73 | +In this case the network configuration is provided via a Secret which is |
| 74 | +either manually created or templated outside the scope of Metal<sup>3</sup> |
| 75 | + |
| 76 | +The BareMetalHost API already supports two interfaces for passing network configuration: |
| 77 | + |
| 78 | +* `networkData` - this data is passed to the deployed OS via Ironic via a |
| 79 | + configuration drive partition. It is then typically read on firstboot by |
| 80 | + a tool such as `cloud-init` which supports the OpenStack network data format. |
| 81 | +* `preprovisioningNetworkDataName` - this data is designed to allow passing data |
| 82 | + during the preprovisioning phase, e.g to configure networking for the IPA deploy |
| 83 | + ramdisk. |
| 84 | + |
| 85 | +The `preprovisioningNetworkDataName` API was added initially to enable [image |
| 86 | +building workflows](https://github.com/metal3-io/baremetal-operator/blob/main/docs/api.md#preprovisioningimage), and a [recent BMO change](https://github.com/metal3-io/baremetal-operator/pull/1380) landed to enable this flow without any custom PreprovisioningImage controller. |
| 87 | + |
| 88 | +#### IPAM configuration |
| 89 | + |
| 90 | +As a user I wish to make use use of the Metal<sup>3</sup> IPAM solution, in a |
| 91 | +DHCP-less environment. |
| 92 | + |
| 93 | +Metal<sup>3</sup> provides an [IPAM controller](https://github.com/metal3-io/ip-address-manager) |
| 94 | +which can be used to allocate IPs used as part of the Metal3Machine lifecycle. |
| 95 | + |
| 96 | +Some gaps exist which prevent realizing this flow in a fully DHCP-less environment, |
| 97 | +so the main focus of the proposal will be how to solve for this use-case. |
| 98 | + |
| 99 | +##### IPAM Scenario 1 - common IPPool |
| 100 | + |
| 101 | +An environment where a common configuration is desired for the pre-provisionining |
| 102 | +phase and the provisioned BareMetalHost (e.g scenario where hosts are permanentaly |
| 103 | +assigned to specific clusters) |
| 104 | + |
| 105 | +##### IPAM Scenario 2 - decoupled preprovisioning/provisioning IPPool |
| 106 | + |
| 107 | +An environment where a decoupled configuration is desired for the pre-provisionining |
| 108 | +phase and the provisioned BareMetalHost (e.g BMaaS scenario where end-user network configuration |
| 109 | +differs from the commissioning phase where a different configuration is desired for inspection/cleaning) |
| 110 | + |
| 111 | +## Design Details |
| 112 | + |
| 113 | +`Metal3MachineTemplate` and `Metal3DataTemplate` are used to apply networkData to specific BareMetalHost resources, |
| 114 | +but they are by design coupled to the CAPI Machine lifecycle. |
| 115 | + |
| 116 | +This is a problem for the pre-provisioning use-case since at this point we're preparing the BareMetalHost for |
| 117 | +use, there is not yet any Machine. |
| 118 | + |
| 119 | +To resolve this below we outline a proposal to add two new resources with similar behavior for the pre-provisioning |
| 120 | +phase `Metal3PreProvisioningTemplate` and `Metal3PreProvisioningDataTemplate` |
| 121 | + |
| 122 | +### API overview |
| 123 | + |
| 124 | +The current flow in the provisioning phase is as follows (only the most relevant fields are included for clarity): |
| 125 | + |
| 126 | +```yaml |
| 127 | +apiVersion: ipam.metal3.io/v1alpha1 |
| 128 | +kind: IPPool |
| 129 | +metadata: |
| 130 | + name: pool-1 |
| 131 | +spec: |
| 132 | + clusterName: cluster |
| 133 | + |
| 134 | +--- |
| 135 | + |
| 136 | +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 |
| 137 | +kind: Metal3DataTemplate |
| 138 | +metadata: |
| 139 | + name: data-template |
| 140 | +spec: |
| 141 | + clusterName: cluster |
| 142 | + networkData: |
| 143 | + networks: |
| 144 | + ipv4: |
| 145 | + - id: eth0 |
| 146 | + ipAddressFromIPPool: pool-1 |
| 147 | + |
| 148 | +--- |
| 149 | + |
| 150 | +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 |
| 151 | +kind: Metal3MachineTemplate |
| 152 | +metadata: |
| 153 | + name: machine-template |
| 154 | +spec: |
| 155 | + template: |
| 156 | + spec: |
| 157 | + dataTemplate: |
| 158 | + name: data-template |
| 159 | + hostSelector: |
| 160 | + matchLabels: |
| 161 | + cluster-role: control-plane |
| 162 | + |
| 163 | +--- |
| 164 | +apiVersion: cluster.x-k8s.io/v1beta1 |
| 165 | +kind: MachineDeployment |
| 166 | +metadata: |
| 167 | + name: machine-deployment |
| 168 | +spec: |
| 169 | + clusterName: cluster |
| 170 | + replicas: 1 |
| 171 | + template: |
| 172 | + spec: |
| 173 | + clusterName: cluster |
| 174 | + infrastructureRef: |
| 175 | + apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 |
| 176 | + kind: Metal3MachineTemplate |
| 177 | + name: machine-template |
| 178 | +``` |
| 179 | +
|
| 180 | +In this flow when a Metal3Machine is provisioned via the `MachineDeployment`, BareMetalHost resources labeled |
| 181 | +`cluster-role: control-plane` will have `networkData` defined with an IP derived from the `pool-1` `IPPool`. |
| 182 | + |
| 183 | +In CAPM3 an IPClaim is created to reserve and IP from the IPPool for each Machine, and an IPAddress resource |
| 184 | +contains the data used for templating of the `networkData` |
| 185 | + |
| 186 | +#### Preprovisioning - Common IPPool |
| 187 | + |
| 188 | +```yaml |
| 189 | +apiVersion: ipam.metal3.io/v1alpha1 |
| 190 | +kind: IPPool |
| 191 | +metadata: |
| 192 | + name: pool-1 |
| 193 | +spec: |
| 194 | + clusterName: cluster |
| 195 | +
|
| 196 | +--- |
| 197 | +
|
| 198 | +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 |
| 199 | +kind: Metal3PreprovisioningDataTemplate |
| 200 | +metadata: |
| 201 | + name: preprov-data-template |
| 202 | +spec: |
| 203 | + preprovisioningNetworkData: |
| 204 | + networks: |
| 205 | + ipv4: |
| 206 | + - id: eth0 |
| 207 | + ipAddressFromIPPool: pool-1 |
| 208 | +
|
| 209 | +--- |
| 210 | +
|
| 211 | +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 |
| 212 | +kind: Metal3PreprovisioningTemplate |
| 213 | +metadata: |
| 214 | + name: preprov-template |
| 215 | +spec: |
| 216 | + template: |
| 217 | + spec: |
| 218 | + dataTemplate: |
| 219 | + name: preprov-data-template |
| 220 | + hostSelector: |
| 221 | + matchLabels: |
| 222 | + pre-provisioning: foo |
| 223 | +``` |
| 224 | + |
| 225 | +In this flow there is no `MachineDeployment`, BareMetalHost resources labeled to match the |
| 226 | +preprov-template hostSelector will have preprovisioningNetworkDataName assigned using the same process |
| 227 | +outlined above for `networkData` above. |
| 228 | + |
| 229 | +There are a few things to consider: |
| 230 | + |
| 231 | +To avoid the risk of multiple Metal3PreprovisioningTemplate resources matching BareMetalHost resources (which would be ambiguous) |
| 232 | +a BMH must match *exactly one* Metal3PreprovisioningTemplate for the conroller to take action, if more than one matches it will be |
| 233 | +reflected as ignored via the Metal3PreprovisioningTemplate status. |
| 234 | + |
| 235 | +The preprovisioningNetworkDataName is used by default for networkData in the baremetal-operator, so in this configuration it's not |
| 236 | +strictly necessary to specify networkData via Metal3DataTemplate, however we'll want to delete the IPClaim after preprovisioning |
| 237 | +in the decoupled flow below so it seems likely we'll want to behave consistently and rely on the IP Reuse functionality if a |
| 238 | +consistent IP is required between pre-provisioning and provisioning phases. |
| 239 | + |
| 240 | +#### Preprovisioning Decoupled IPPool |
| 241 | + |
| 242 | +```yaml |
| 243 | +apiVersion: ipam.metal3.io/v1alpha1 |
| 244 | +kind: IPPool |
| 245 | +metadata: |
| 246 | + name: pool-1 |
| 247 | +spec: |
| 248 | + clusterName: cluster |
| 249 | +
|
| 250 | +--- |
| 251 | +
|
| 252 | +apiVersion: ipam.metal3.io/v1alpha1 |
| 253 | +kind: IPPool |
| 254 | +metadata: |
| 255 | + name: preprovisioning-pool |
| 256 | +
|
| 257 | +--- |
| 258 | +
|
| 259 | +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 |
| 260 | +kind: Metal3PreprovisioningDataTemplate |
| 261 | +metadata: |
| 262 | + name: preprov-data-template |
| 263 | +spec: |
| 264 | + preprovisioningNetworkData: |
| 265 | + networks: |
| 266 | + ipv4: |
| 267 | + - id: eth0 |
| 268 | + ipAddressFromIPPool: preprovisioning-pool |
| 269 | +
|
| 270 | +--- |
| 271 | +
|
| 272 | +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 |
| 273 | +kind: Metal3PreprovisioningTemplate |
| 274 | +metadata: |
| 275 | + name: preprov-template |
| 276 | +spec: |
| 277 | + template: |
| 278 | + spec: |
| 279 | + dataTemplate: |
| 280 | + name: preprov-data-template |
| 281 | + hostSelector: |
| 282 | + matchLabels: |
| 283 | + pre-provisioning: foo |
| 284 | +
|
| 285 | +--- |
| 286 | +
|
| 287 | +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 |
| 288 | +kind: Metal3DataTemplate |
| 289 | +metadata: |
| 290 | + name: data-template |
| 291 | +spec: |
| 292 | + clusterName: cluster |
| 293 | + networkData: |
| 294 | + networks: |
| 295 | + ipv4: |
| 296 | + - id: eth0 |
| 297 | + ipAddressFromIPPool: pool-1 |
| 298 | +
|
| 299 | +--- |
| 300 | +
|
| 301 | +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 |
| 302 | +kind: Metal3MachineTemplate |
| 303 | +metadata: |
| 304 | + name: machine-template |
| 305 | +spec: |
| 306 | + template: |
| 307 | + spec: |
| 308 | + dataTemplate: |
| 309 | + name: data-template |
| 310 | + hostSelector: |
| 311 | + matchLabels: |
| 312 | + cluster-role: control-plane |
| 313 | +
|
| 314 | +``` |
| 315 | + |
| 316 | +In this flow we have `preprovisioning-pool` which is not associated with any cluster, this is used to provide an IPAddress during |
| 317 | +the pre-provisioning phase as described above. To reduce the required size of the pool, the IPClaim will be deleted after the |
| 318 | +preprovisioning phase is completed, e.g the BMH resource becomes available. |
| 319 | + |
| 320 | +In the provisioning phase another pool, associated with a cluster is used to template networkData as in the existing process. |
| 321 | + |
| 322 | +#### Assumptions and Open Questions |
| 323 | + |
| 324 | +TODO |
| 325 | + |
| 326 | +### Inspection on initial registration |
| 327 | + |
| 328 | +On initial registration of a host, inspection is triggered immediately but this process cannot complete without preprovisioning network configuration in a DHCP-less environment (because the IPA ramdisk can't connect back to the Ironic API). |
| 329 | + |
| 330 | +To resolve this we can add a new BareMetalHost API `PreprovisioningNetworkDataRequired` which defaults to false, but when set to true will describe that the host cannot move from Registering -> Inspecting until `preprovisioningNetworkDataName` has been set. |
| 331 | + |
| 332 | +An alternative could be to require that the BareMetalHost resources are created with the existing [paused annotation](https://github.com/metal3-io/baremetal-operator/blob/main/docs/api.md#pausing-reconciliation), set to a pre-determined value (e.g `metal3.io/preprovisioning`) which can then be removed by the new controller after `preprovisioningNetworkDataName` has been set, then inspection will be able to succeed. |
| 333 | + |
| 334 | +### Implementation Details/Notes/Constraints |
| 335 | + |
| 336 | +#### IP Reuse |
| 337 | + |
| 338 | +A related issue has been previously addressed via the [IP Reuse](https://github.com/metal3-io/cluster-api-provider-metal3/blob/main/docs/ip_reuse.md) functionality - this means we can couple IPClaims to the BareMetalHost resources which will enable consistent IP allocations for pre-provisioning and subsequent provisioning operations (provided the same IPPool is used for both steps) |
| 339 | + |
| 340 | +### Risks and Mitigations |
| 341 | + |
| 342 | +- TODO |
| 343 | + |
| 344 | +### Work Items |
| 345 | + |
| 346 | +TODO |
| 347 | + |
| 348 | +### Dependencies |
| 349 | + |
| 350 | +#### Firstboot agent support |
| 351 | + |
| 352 | +An agent in the IPA ramdisk image is required to consume the network data provided via the processes outlined above. |
| 353 | + |
| 354 | +The Ironic DHCP-less documentation describes using glean (a minimal python-based cloud-init alternative), but we don't |
| 355 | +currently have any community-supported IPA ramdisk image containing this tool. |
| 356 | + |
| 357 | +There are several other options such as cloud-init, or even custom scripts/tooling which may be coupled to the OS, so we |
| 358 | +do not define a specific solution as part of this proposal. |
| 359 | + |
| 360 | +#### Potential config-drive conflict on redeployment |
| 361 | + |
| 362 | + |
| 363 | +### Test Plan |
| 364 | + |
| 365 | +TODO |
| 366 | + |
| 367 | +### Upgrade / Downgrade Strategy |
| 368 | + |
| 369 | +TODO |
| 370 | + |
| 371 | +### Version Skew Strategy |
| 372 | + |
| 373 | +N/A |
| 374 | + |
| 375 | +## Drawbacks |
| 376 | + |
| 377 | +TODO |
| 378 | + |
| 379 | +## Alternatives |
| 380 | + |
| 381 | + |
| 382 | +### Kanod |
| 383 | + |
| 384 | +One possibility is to manage the lifecycle of `preprovisioningNetworkDataName` outside of |
| 385 | +the Metal<sup>3</sup> core components - such an approach has been successfully demonstrated |
| 386 | +in the [Kanod community](https://gitlab.com/Orange-OpenSource/kanod/) which is related to |
| 387 | +the [Sylva](https://sylvaproject.org) project. |
| 388 | + |
| 389 | +The design proposal here has been directly inspired by this work, but I think directly integrating |
| 390 | +this functionality into CAPM3 has the following advantages: |
| 391 | + |
| 392 | +* We can close a functional gap which potentially impacts many Metal<sup>3</sup> users, not only those involved with Kanod/Sylva |
| 393 | +* Directly integrating into CAPM3 means we can use a common approach for `networkData` and `preprovisioningNetworkData` |
| 394 | + |
| 395 | +## References |
| 396 | + |
| 397 | +TODO |
0 commit comments