diff --git a/design/baremetal-operator/node-specific-kernel-args.md b/design/baremetal-operator/node-specific-kernel-args.md new file mode 100644 index 00000000..8dc322d7 --- /dev/null +++ b/design/baremetal-operator/node-specific-kernel-args.md @@ -0,0 +1,201 @@ + + +# pre-provisioning kernel argument support + +## Status + +planned + +## Summary + +This document proposes a new BareMetalHost spec attribute intended for +configuring the pre-provisioning image's (usually IPA) kernel arguments. + +## Motivation + +There are situations where a user would need node specific kernel arguments +for the pre-provisioning image. The reason could be the use of a mixed set of +hardware, troubleshooting provisioning or boot issues, selectively configuring +custom logic e.g. enabling IPA plugins or other services embedded in the +pre-provisioning image. + +### Goals + +1. Implement a string type optional spec field named `preprovisioningExtraKernelParams` +2. Extend the generic `Provisioner` interface to support the new optional field +3. Implement support for the `preprovisioningExtraKernelParams` in the Ironic provisioner +4. Combine the values of the `preprovisioningExtraKernelParams` and the + `preprovisioningImage`'s `status.extraKernelParams` field if a + `preprovisioningImage` is refferences by the BMH and teh `status.extraKernelParams` + is not empty +5. Introducing a `preprovisioningExtraKernelParams` related status field that + would display the last applied `preprovisioningExtraKernelParams` value. + Let's call this field `status.lastAppliedPreprovisioningExtraKernelParams` + +### Non-Goals + +1. To implement any sort validation for the new optional field +2. Providing a default value for the new spec field +3. Implement conditional logic to overwrite default kernel parameters on + per node basis + +## Proposal + +This document proposes a new BareMetalHost spec attribute intended for +configuring the pre-provisioning image's (usually IPA) kernel arguments. + +The proposal involves the introduction of a new spec field thus a modification +to the BMO API. The API change is intended to be backward compatible as the +new `preprovisioningExtraKernelParams` field is optional. + +### Implementation Details/Notes/Constraints + +The `preprovisioningExtraKernelParams` field would be located directly under `spec` +section. The format of the new field would be a list/array of individual +strings. The new field would be optional and it would be considered +either a NOOP or a cleaning operation depending on the previous value of the +field in case it would be not present or have no value or would have an empty +array as a value. + +The proposed BMH looks like the following: + +```yaml +apiVersion: metal3.io/v1alpha1 +kind: BareMetalHost +metadata: + name: bm0 +spec: + online: true + bmc: + address: idrac://192.168.122.1:6230/ + credentialsName: bm0-bmc-secret + preprovisioningExtraKernelParams: "--timeout 60000 systemd.journald.forward_to_console=yes ipa-debug=1" + ... + ... + ... + bootMACAddress: 52:54:00:b7:b2:6f +... +... +... + +``` + +The example contains a snippet of a common IPA kernel argument list. +The `preprovisioningExtraKernelParams` string would be received by the `Provisioner` +interface as is. The handling of the list would be up to the particular +implementations of the `Provisioner` interface. + +Given the BMH snippet from above, the kernel parameter list generated by +the `Ironic provisioner` and posted to the Ironic Node API would look like +almost the same with one important addition. The ironic provisioner will +always prepend the `%default%` substring to the `preprovisioningExtraKernelParams`. + +The addition of the "%default%" will cause Ironic to inject the extra +kernel parameters provided via the ironic config file and ironic-image +environment variables. The inclusion of "%default%" is beneficial from +multiple point of view, it makes the API based kernel parameter update logic +consistent with the same feature used for dynamically built `preprovisioningImage`s. +An other benefit is that kernel parameters coming from environment variables +might contain credentials like ssh keys, or other sensitive information +that would cause a security issue to display in a spec field. + +It is possible to implement an annotation or an other spec field to +force BMO to overwrite the node specific kernel arguments. Implementing +logic to conditionally overwrite the parameters might prove useful in some +scenarios but it would completely deviate from how users are applying +kernel parameters both with pre built or with dynamically build pre provisioning +images thus using this feature might cause unexpected issues. + +In this proposal I am proposing the minimal viable feature to add node +specific kernel arguments, in case in the future there will be interest in +fully overwriting kernel parameters then the proposed feature can be extended +with the relevant logic. + +The `status.lastAppliedPreprovisioningExtraKernelParams` would have some limitations. +An Ironic node's or other provisioners relevant API endpoint could be updated +with extra kernel parameters such a way to cause the preprovisioning agent to not +boot properly or behave incorrectly even though the HTTP API patching was +successful. The status in the aforementioned case would only show what was +accepted by the provisioner API last time but would not provide info about +the sensibility of the value and wouldn't even mean that the value is actively +used because an updated `preprovisioningExtraKernelParams` value would be only +used at the next suitable life cycle operation. + +### Risks and Mitigations + +Providing a faulty list of kernel arguments might result in boot issues +or unexpected behavior during the life cycle of the pre-provisioning image. +There is no possibility to point out explicitly when the provisioning image +boot process fails as the provisioner would simply not receive any data so +it would time out, but the same is true in case of network or hardware +issues so user won't be able to distinguish without investigation. + +## Design Details + +Most of the design is very straightforward and easy to combine with the +reconcile logic, the support for node specific pre-provisioning kernel args +also already exist in Ironic, so right after the feature is implemented in BMO +it can be used without waiting for Ironic implementation to arrive. + +The only challenge is the process of updating/cleaning the previously applied +kernel argument list, this cleanup/update process highly depends on the +particular provisioner. + +This document proposes extending the existing "NodeUpdate" functions of the +Ironic provisioner utilized during provisioning, inspection and registration +with updating the kernel parameters of the Ironic driver_info. In case of +servicing and preparing states/operations "NodeUpdate"s have to be added this +based on the user's workflow there might be a few extra node updates based on +how many tines are the servicing or preparing logic is run. + +### Work Items + +- Implement a string type optional spec field named `preprovisioningExtraKernelParams` +- Extend the generic `Provisioner` interface to support the new optional field +- Implement support for the `preprovisioningExtraKernelParams` in the Ironic driver +- Extend the data types used in different operational states to hold + an entry for `preprovisioningExtraKernelParams` +- Implement unit tests, and align existing unit tests as needed +- Add a check to the existing BMO e2e workflow that checks that the redfish + images and the PXE config files to verify that the provisioner image will + receive the correct parameters. Logging into a running IPA could also work + but in e2e tests Metal3 CI is using upstream IPA images that don't allow + users to log in via neither serial or ssh connections. + +### Dependencies + +- Ironic in case of the `Ironic Provisioner` + +### Test Plan + +- Unit tests for the functions +- Integration testing with VMs, most likely as part of the existing + e2e workflow. + +### Upgrade / Downgrade Strategy + +- In case of upgrade user would not need to do anything +- In case of downgrade the user has to remove the field from the BMH + to both make the BMH compatible with the older API and to allow the + particular provisioner in use to return using the defaults arguments list. + +### Version Skew Strategy + +None + +## Drawbacks + +None + +## Alternatives + +None + +## Reference + +None