Skip to content

Conversation

@alegacy
Copy link

@alegacy alegacy commented Oct 15, 2025

This adds a proposed design to add support for the new Ironic Networking service which provides the ability to control Top-of-Rack (TOR) switches without requiring any other OpenStack services (e.g., Neutron).

The Ironic Networking service is still under development. It is working its way thru the review process which you can check on here:
https://review.opendev.org/q/(status:open+or+is:wip)+topic:feature/standalone-networking

@metal3-io-bot metal3-io-bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 15, 2025
@metal3-io-bot
Copy link
Contributor

Hi @alegacy. Thanks for your PR.

I'm waiting for a metal3-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@metal3-io-bot metal3-io-bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Oct 15, 2025
@dtantsur
Copy link
Member

/ok-to-test

@metal3-io-bot metal3-io-bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 16, 2025
@alegacy alegacy force-pushed the feature/standalone-networking branch from e13ce12 to c8c325e Compare October 16, 2025 11:26
@metal3-io-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign dtantsur for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

spec:
# ... other fields ...
networkInterfaces:
- macAddress: "aa:bb:cc:dd:ee:01"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any value in making each record here a separate CRD (like HostNetworkInterface)? This is not a suggestion, I'm genuinely asking.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, but I don't think there is any value in that. In fact, I think it unnecessarily drives up the complexity. The information is tightly coupled to the BMH and there is no chance it could be reused by reference anywhere else... so I think this is the best place for it.

Name string `json:"name"`

// Namespace of the BareMetalNetworkAttachment (defaults to BMH namespace)
Namespace string `json:"namespace,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may have security implications. We already had to fix a CVE where referencing a secret from another namespace could be used for privilege escalation. A HostNetworkAttachment should not consider anything sensitive.. but my gut feeling is to avoid cross-namespace referencing.

Do you have a strong case for it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, maybe we need to discuss this a bit then.... the reason I did it this way is that my assumption is that BMHs may be organized in groups and put into namespaces according to those groups (whatever they may be), but then someone may end up picking up BMHs from different groups to build a cluster and that cluster would have a set of NetworkAttachments that should be applied to each of those BMHs. ... the attachments and BMH wouldn't necessarily all be in the same namespace.


This file will be stored in a Secret and mounted to the Ironic Networking
container. When any `BareMetalSwitch` is created, updated, or deleted, the
Secret will be regenerated and the Ironic Networking service will be restarted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continuing from above: this is a bit of a problematic point since the Ironic pod(s) are not managed by baremetal-operator, rather by ironic-standalone-operator (upstream) or CBO (openshift).

If we can make ironic-networking read these files on fly, maybe we can change the mounted secret in runtime without restarting anything? I'm open to other ideas too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had code in my Ironic review to monitor the file and dynamically load the drivers, but was asked to take it out for the initial pass and maybe add it back in later if needed. That would be my preference as I don't like restarting services to reload stuff, but, as the outsider, don't want to break the existing model that already exists.

6. **Container Deployment**: The Ironic Networking service will run as a
separate container in the metal3 pod, alongside the existing metal3-ironic
container. It will share the `/etc/ironic` volume for configuration files
and the htpasswd file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These files are generated on fly in Metal3, so I really wonder if you need to share the pod. Not sharing the pod will make at least one thing much easier: you won't be forced into host networking.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was just trying to follow the existing pattern. Of course, if it is better/easier/simpler to just do a separate Pod then I'm open to that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our existing patterns are sometimes problematic and reflect the past desire to do things quickly :)

8. **Service Network Scope**: The service network configuration applies to a
subset of the Ironic provider networks. Specifically, it configures only
the idle and inspection networks. All other provider networks (provisioning,
cleaning, rescuing, servicing) will default to the tenant VLAN configured on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems wrong: service network applies to all these. Most people will want the same network for inspection, cleaning and such, while having separate network(s) for the tenant workload.

nit: Metal3 does not support rescuing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was this... In the OpenShift case at least, all operations are always done on the final VLAN (i.e., the "tenant" VLAN) since ToR configuration is done manually ahead of time. Therefore, it seemed reasonable to assume that everything except for "inspection" could be done on the tenant VLAN... inspection being the exception here since we need hosts to sit on the "idle" VLAN prior to being deployed so that initial inspection always works.

Today, I have coded the Ironic standalone-experimental driver such that any provider networks that doesn't have a specific VLAN configured in the config file we fall back to the tenant VLAN (i.e., the VLAN in the port.extra.switchport attribute). Therefore if we only set the idle_network, and inspection_network then all other networks will default to the tenant network and things should work as they do today (except for inspection which would happen on the new "idle" network).

Recall that demo I did a few weeks ago... you pointed out that the final tenant VLAN seemed to get configured too early in the provisioning phase and that it should not happen until the final deploy step. Well, I tried to address that by setting all of the provider networks to the same as the idle network. That didn't work because in OpenShift the assisted installer image is first booted before the final image is deployed. That image ran a DHCP on all interfaces and noticed that all interfaces got an IP address on the same network and complained and refused to finish the install. So I reverted back to this approach of setting inspection=idle, and then all of the rest to the tenant VLAN and that worked.

Perhaps another discussion is in order here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These assumptions are very-very different from the assumptions Ironic has about networking.

The strong expectation is that this feature is used to establish a security boundary between the provisioning network (where insecure PXE traffic happens and where IPA is running) and tenant networks (with also potentially insecure workflows), as well as between tenant networks. You're excluding the first part, which the Ironic community sees as pretty important.

In fact, there is nothing special about inspection network from the Ironic perspective. With fast track (which we need to make work eventually, remember), inspection flows into cleaning, which flows into provisioning. The security properties of all 3 networks is the same, and so is the desire to not mix the PXE/IPA/whatever traffic with the customer's workloads.

Assisted installer is, yes, a different topic. We need to make it work on the "service" networks, especially since inspection can be both enabled and disabled, and fast track basically means that the assisted agent may end up running on the inspection network.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll schedule a call with you to level set on some of these concepts.

**Mitigation**: Implement thorough validation of `BareMetalSwitch` and
`BareMetalNetworkAttachment` resources. Add status conditions to report
configuration errors. Ensure service network is properly configured before
attempting to provision hosts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to make sure that if cleaning is disabled, we don't try to access switches or validating anything on BMH deletion. So that even broken hosts can be cleanly deleted.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's ok... with my latest revision in Ironic setting the port's VLAN back to the "idle" VLAN should happen naturally without any extra steps so when transitioning into a state where delete would be accepted the switch port should already be ok.

@alegacy alegacy force-pushed the feature/standalone-networking branch 2 times, most recently from c6f27ae to 7989fef Compare October 17, 2025 19:55
@alegacy
Copy link
Author

alegacy commented Oct 17, 2025

@dtantsur I've addressed the minor stuff... will need to discuss some of the larger items next week. Thank you for your review!

@tuminoid
Copy link
Member

/cc @lentzi90 @Rozzii @kashifest

This adds a proposed design to add support for the new
Ironic Networking service which provides the ability to
control Top-of-Rack (TOR) switches without requiring any
other OpenStack services (e.g., Neutron).

Signed-off-by: Allain Legacy <[email protected]>
@alegacy alegacy force-pushed the feature/standalone-networking branch from 7989fef to e9ed651 Compare October 21, 2025 18:47
@kashifest
Copy link
Member

Overall I don't have any objection any idea. Rather I think its a welcome addition. I would need time to a deep dive on the technical aspect of how the proposed functionality would land in Metal3.

Copy link

@diconico07 diconico07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is an overall nice feature coming, I have some concern about how it's tightly tied into IrSO by design here.


### Ironic Configuration

When `Ironic.NetworkingService.Enabled` is true, Metal³ will configure

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would only be valid for a deployment using IrSO, there should also be a clear way to deploy without IrSO I believe.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My apologies. I understood IrSO to be the primary deployment method. Could you point me to some documentation that describes the existing non-IrSO deployments?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're moving towards IrSO being the primary way of deployment. There is nothing wrong with other people using other means (in fact, there have been requests to bring the Suse's charts upstream), but Allain is right in defaulting to describing the IrSO approach.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I missed the change in the docs to phase out other ways of deployments (last time I looked at that part all method were still equally listed).

7. **Network Driver Selection**: The Baremetal Operator will automatically set
the Ironic node's network interface driver to `standalone-experimental`
when:
- `enabled` is true in the `Ironic.NetworkingService.Enabled` resource

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Ironic resource is not managed by the BMO, and BMO shouldn't rely on its presence, so will require a configuration option for BMO to enable that behavior rather than using the Ironic resource state here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry, I meant to reword this to indicate that BMO should accept an environment variable (e.g., IRONIC_NETWORKING_ENABLED) to control this behaviour -- defaulted to false of course. I'll revise.

@alegacy
Copy link
Author

alegacy commented Oct 31, 2025

It is an overall nice feature coming, I have some concern about how it's tightly tied into IrSO by design here.

Thanks @diconico07 ... I'll review the user-guide and amend the document with information about non-IrSO scenarios.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants