-
Notifications
You must be signed in to change notification settings - Fork 56
Description
(this is all based on a meeting this morning with @jgallagher @internet-diglett and I -- my apologies for any errors here)
Background
- In real systems, Dendrite configuration is switch-specific. Switches may have different uplinks available, different routing rules, even different external connectivity (in a future multi-rack work). That means Nexus always needs to know which one it's talking to.
- (I believe) Dendrite does not currently know which switch it is (and doesn't need to).
- In a real system, MGS (Management Gateway) is the source of truth for which switch the "switch zone" is attached to.
In order for Nexus to do anything with switches, it needs to be able to find the Dendrite instances for each switch. There are two parts to this problem:
- knowing the (IP address, TCP ports) where Dendrite instances are running
- knowing which switch each Dendrite instance is attached to
How this works today
Finding services
To find the switch zone services:
- Nexus discovers the switch zone addresses by looking up Dendrite's SRV records and taking just the IP addresses
- Nexus then figures out which is which by asking the MGS at a hardcoded port at the same address
- Nexus then reaches the other services using these addresses plus hardcoded ports -- e.g.,
Line 1118 in 032c556
let port = DENDRITE_PORT;
So how does DNS get filled in with these addresses and ports? In general, internal DNS contents gets computed in two different places: (1) during rack setup, RSS computes it; and (2) after that, Nexus computes it from the current target blueprint during blueprint execution. In real systems, when RSS does it, the IPs are those of the switch zones that it knows about and the TCP ports are the hardcoded ports for these services. There's not really another approach here because unlike control plane zones, the switch zone is started before RSS is even running so there's no way for RSS to configure what TCP ports it's using (as it does for control plane zones). In real systems, when blueprint execution computes internal DNS, it also uses their hardcoded ports. (The linked code does mention overrides for testing, but those are only used for unit tests. They're not present in the Nexus that gets spun up by the test suite and I think that's for the best.)
Which switch does each service correspond with?
As mentioned above, Nexus asks MGS which switch it's attached to, and then it assumes that all the other services at the same IP are talking to the same switch.
The problem
None of this works in the test suite.
Problem 1: Finding the IP/port addresses of the networking services
The test harness sets up a virtual control plane that includes MGS and Dendrite, and those wind up running on arbitrary ports. Like everything else in the test suite, we do this in order to support concurrent execution of multiple tests.
The test harness plays the role of RSS in this context. It appears to configure internal DNS with the actual TCP ports for the MGS and Dendrite that it started:
omicron/nexus/test-utils/src/lib.rs
Lines 748 to 750 in 032c556
self.dendrite.get(&switch_location).unwrap().port, | |
self.gateway.get(&switch_location).unwrap().port, | |
self.mgd.get(&switch_location).unwrap().port, |
So initially, the TCP ports will be correct. However, as soon as the test system executes a blueprint, those TCP ports will wind up changed to the wrong values (the hardcoded ones for production systems). I verified this in cargo xtask omicron-dev run-all
(which runs basically the same environment as each test suite test):
$ cargo xtask omicron-dev run-all
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.66s
Running `target/debug/xtask omicron-dev run-all`
Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.17s
Running `target/debug/omicron-dev run-all`
omicron-dev: setting up all services ...
log file: /dangerzone/omicron_tmp/omicron-dev-omicron-dev.22889.0.log
note: configured to log to "/dangerzone/omicron_tmp/omicron-dev-omicron-dev.22889.0.log"
DB URL: postgresql://root@[::1]:48588/omicron?sslmode=disable
DB address: [::1]:48588
log file: /dangerzone/omicron_tmp/omicron-dev-omicron-dev.22889.2.log
note: configured to log to "/dangerzone/omicron_tmp/omicron-dev-omicron-dev.22889.2.log"
omicron-dev: Adding disks to first sled agent
omicron-dev: services are running.
omicron-dev: nexus external API: 127.0.0.1:12220
omicron-dev: nexus internal API: [::1]:12221
omicron-dev: cockroachdb pid: 22893
omicron-dev: cockroachdb URL: postgresql://root@[::1]:48588/omicron?sslmode=disable
omicron-dev: cockroachdb directory: /dangerzone/omicron_tmp/.tmpdq7MIy
omicron-dev: internal DNS HTTP: http://[::1]:53334
omicron-dev: internal DNS: [::1]:53652
omicron-dev: external DNS name: oxide-dev.test
omicron-dev: external DNS HTTP: http://[::1]:41750
omicron-dev: external DNS: [::1]:55935
omicron-dev: e.g. `dig @::1 -p 55935 test-suite-silo.sys.oxide-dev.test`
omicron-dev: management gateway: http://[::1]:47805 (switch0)
omicron-dev: silo name: test-suite-silo
omicron-dev: privileged user name: test-privileged
...
Enable the initial target blueprint:
$ ./target/debug/omdb --dns-server [::1]:53652 nexus blueprints list
note: Nexus URL not specified. Will pick one from DNS.
note: using Nexus URL http://[::1]:12221
T ENA ID PARENT TIME_CREATED
* no bfd9cbc5-8f90-4839-9808-38db6dbbbf12 <none> 2025-09-04T21:26:33.764Z
$ ./target/debug/omdb --dns-server [::1]:53652 nexus blueprints target enable current -w
note: Nexus URL not specified. Will pick one from DNS.
note: using Nexus URL http://[::1]:12221
set target blueprint bfd9cbc5-8f90-4839-9808-38db6dbbbf12 to enabled
Wait a second, then see that the internal DNS version has been bumped:
$ ./target/debug/omdb --dns-server [::1]:53652 db dns show
note: database URL not specified. Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using database URL postgresql://root@[::1]:48588/omicron?sslmode=disable
note: database schema version matches expected (186.0.0)
GROUP ZONE ver UPDATED REASON
internal control-plane.oxide.internal 2 2025-09-04T21:28:25Z blueprint bfd9cbc5-8f90-4839-9808-38db6dbbbf12 (initial test blueprint)
external oxide-dev.test 2 2025-09-04T21:26:34Z create silo: "test-suite-silo"
$ ./target/debug/omdb --dns-server [::1]:53652 db dns diff internal 2
note: database URL not specified. Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using database URL postgresql://root@[::1]:48588/omicron?sslmode=disable
note: database schema version matches expected (186.0.0)
DNS zone: control-plane.oxide.internal (Internal)
requested version: 2 (created at 2025-09-04T21:28:25Z)
version created by Nexus: e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c
version created because: blueprint bfd9cbc5-8f90-4839-9808-38db6dbbbf12 (initial test blueprint)
changes: names added: 6, names removed: 5
+ _dendrite._tcp (records: 1)
+ SRV port 12224 dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal
+ _mgd._tcp (records: 1)
+ SRV port 4676 dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal
+ _mgs._tcp (records: 1)
+ SRV port 12225 dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal
+ _nexus._tcp (records: 2)
+ SRV port 12223 a4ef738a-1fb0-47b1-9da2-4919c7ec7c7f.host.control-plane.oxide.internal
+ SRV port 12221 e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c.host.control-plane.oxide.internal
+ a4ef738a-1fb0-47b1-9da2-4919c7ec7c7f.host AAAA ::1
+ dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host AAAA ::2
- _dendrite._tcp (records: 1)
- SRV port 52689 dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal
- _mgd._tcp (records: 1)
- SRV port 45341 dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal
- _mgs._tcp (records: 1)
- SRV port 47805 dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host.control-plane.oxide.internal
- _nexus._tcp (records: 1)
- SRV port 12221 e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c.host.control-plane.oxide.internal
- dendrite-b6d65341-167c-41df-9b5c-41cded99c229.host AAAA ::1
The correct ports for the services started by the test suite have been replaced by the stock hardcoded ones.
This isn't a huge deal because we don't enable blueprint execution by default in tests, but I think it's fair to say this is at least partly working by accident.
Problem 2: which switch goes with each service?
In the test suite, all of these are running on localhost, which breaks the assumption Nexus makes that it can determine which switch, say, Dendrite manages based on which MGS instance it shares an IP address with.
Proposed solution
For the problem of discovering switch zone services more reliably, we suggested:
- add a new field to the blueprint that specifies the IP address and TCP port for each switch service
- RSS can fill this in properly for the first blueprint because it knows it
- the test runner can also fill this in properly for the first blueprint because it knows it
- we need to keep this up to date if scrimlets get moved around -- I can't remember from the call if we determined how to do this. (On real systems, the planner has the information to do this. In the test suite, though, I think that would cause it to do the wrong thing.)
- during blueprint execution, in computing DNS, Nexus would use these addresses and ports rather than inferring the switch zone addresses and hardcoding TCP ports
Recall that this problem is less urgent than the next one because currently the initial DNS contents in the test suite are fine and only blueprint execution (which is currently disabled out of the box) breaks them.
For the problem of figuring out which switch each service goes with:
- Dendrite should be able to tell clients directly which switch it's managing
- Dendrite should ask MGS for this information. (It already talks to its local MGS for other reasons.)
- Dendrite should support a start-time config option for the address/port of the MGS to use so that the test suite can point it at the MGS that it started, rather than assuming it should talk to the one on localhost and the hardcoded port.
- Rather than assuming the switch is determined by the IP address, Nexus should ask Dendrite which switch it's managing and use that
Another option related to all this is:
- have the inventory collector ask each MGS which switch it's attached to and record that
- maybe do the same with Dendrite and the other switch zone services (after having located them using the generic _dendrite._tcp records)?
- have blueprint planner latch this state
- have blueprint execution create DNS names specific to each rack, switch, and service, like
$rack_id.switch_0._dendrite._tcp
(separate from the existing_dendrite._tcp
records).
This way, Nexus would look up ServiceName::Dendrite(RackId, SwitchLocation)
and get back exactly one thing which is the right Dendrite for that switch in that rack. This would be more aligned with using DNS the way we intended (which is: all SRV records under a given name should be fungible, which isn't currently the case for _dendrite._tcp).