Conversation
gulducat
left a comment
There was a problem hiding this comment.
Lots of comments about style, because I love that kinda thing, but also a couple related to error-handling that I think need fixing.
Aside from that, enabling net.ipv4.conf.virbr0.route_localnet on my host was straightforward enough, but it took me some extra sleuthing to determine how to engage this code with a real job.
Agent config:
client {
host_network "loopback" {
cidr = "127.0.0.1/8"
}
}Job:
group "g" {
network {
port "ssh" {
host_network = "loopback"
to = 22
}
}
}We should prolly write this down.
| // Add the postrouting chain if it does not exist. | ||
| postroutingCreated, err := ensureIPTablesChain(ipt, iptablesNATTableName, postroutingIPTablesChainName) | ||
| if err != nil { | ||
| c.logger.Error("failed to create iptables chain", "chain", postroutingIPTablesChainName, "error", err) |
There was a problem hiding this comment.
Missing a return here, though the only result is a misleading Warn claiming that it is enabled.
| c.logger.Error("failed to create iptables chain", "chain", postroutingIPTablesChainName, "error", err) | |
| if err != nil { | |
| c.logger.Error("failed to create iptables chain", "chain", postroutingIPTablesChainName, "error", err) | |
| return |
These things are annoyingly easy to miss.
| // Ensure loopback port forwarding is setup | ||
| initLoopbackForwards.Do(c.enableLoopbackPortForwards) |
There was a problem hiding this comment.
This doesn't actually ensure it, since it's in a sync.Once, there's no way for the method to surface a failure... I guess c.enableLoopbackPortForwards could set a field on c that could be checked? or make this a funky little closure to catch the error? e.g.
var enableLoopbackErr error
initLoopbackForwards.Do(func() { enableLoopbackErr = c.enableLoopbackPortForwards() })There was a problem hiding this comment.
A robot friend also pointed out to me that this method being a package-level sync.Once means that if it fails, it will never have another chance to succeed, unless something external kicks over the driver process. Do we expect that to be reflective of real systems?
There was a problem hiding this comment.
Really it should just be synced since it's more likely that once the error is encountered the runtime config would likely get adjusted and the job resubmitted. The device enablement is in that same boat, though there it could just be that extra rule(s) end up getting added.
| if val == "1" { | ||
| return true | ||
| } |
There was a problem hiding this comment.
I noticed earlier that routeLocalnetAllPath is just routeLocalnetPathTemplate with %s -> "all", but I skated past it. Now in this method we're maintaining duplicate code to treat it as special when the only difference is the log message?
What if this instead took (devices ...string) and looped over them? Then the caller would be something like
// somewhere: const localnetAll = "all"
if !c.loopbackPortForwardsSupported(localnetAll, dstIface)We could still make the log message conditionally translate "all" to "global" if we want to retain that.
There was a problem hiding this comment.
I wanted the global to always be checked (not relying on being passed as an argument) because it always has precedence. It was originally just checking the global value but finding the device specific options the template + device option got tacked on. I've compressed the implementation down but still maintains the same behavior.
| d, err := os.Open(devPath) | ||
| if err != nil { | ||
| c.logger.Debug("could not open device localnet routing config file", "path", devPath, "error", err) | ||
| return false | ||
| } | ||
| defer d.Close() | ||
|
|
||
| content, err := io.ReadAll(d) |
There was a problem hiding this comment.
🏌️
| d, err := os.Open(devPath) | |
| if err != nil { | |
| c.logger.Debug("could not open device localnet routing config file", "path", devPath, "error", err) | |
| return false | |
| } | |
| defer d.Close() | |
| content, err := io.ReadAll(d) | |
| content, err := os.ReadFile(devPath) |
| for _, chain := range chains { | ||
| if chain == postroutingIPTablesChainName { | ||
| return true | ||
| } | ||
| } | ||
|
|
||
| return false |
There was a problem hiding this comment.
🏌️
| for _, chain := range chains { | |
| if chain == postroutingIPTablesChainName { | |
| return true | |
| } | |
| } | |
| return false | |
| return slices.Contains(chains, postroutingIPTablesChainName) |
| return "", err | ||
| } | ||
|
|
||
| checkAddr, err := netip.ParseAddr(ip) |
There was a problem hiding this comment.
unhandled error
| checkAddr, err := netip.ParseAddr(ip) | |
| checkAddr, err := netip.ParseAddr(ip) | |
| if err != nil { | |
| return "", err | |
| } |
| t.Run("direct", func(t *testing.T) { | ||
| testutil.RequireIPTables(t) |
There was a problem hiding this comment.
For another PR: This will skip the test if not-root or no iptables on the test runner. I'd like a way to ensure that it actually runs in CI, while skipping is OK on a developer machine. Maybe check an env var in the helper?
This branch seems not to have the test fix from #196 - the test run reports "DONE 0 tests in 31.075s"
There was a problem hiding this comment.
Yeah, I can spin out a quick PR to include an env check 👍
| iptablesInterfaceGetter: c.iptablesInterfaceGetter, | ||
| logger: c.logger, | ||
| netConn: conn, | ||
| routingInterfaceByIPGetter: c.routingInterfaceByIPGetter, |
There was a problem hiding this comment.
I suppose it's ok to exclude c.routeLocalnet{Path,Template} from Copy because they're not user-configurable, and they'll get defaults in practice if unset, but still seems like an oversight.
|
|
||
| // Still here, then add the rule. | ||
| if err := ipt.Append(iptablesNATTableName, postroutingIPTablesChainName, rule...); err != nil { | ||
| return fmt.Errorf("failed to add loopback rule for device: %w", err) |
There was a problem hiding this comment.
High-level food for thought, maybe to document?
robot observed that there's no teardown for this rule, which makes sense, as this is global/device-level and outlives any particular VM.
More broadly, having run this driver before without rebooting, the NOMAD_VT_PRT chain is still chilling in my iptables. Nomad proper also leaves its bridge networking chain, NOMAD-ADMIN, lying around, and Docker does it too, so there's precedence, and it doesn't hurt anything to have an empty chain, but I also don't think we have this written down anywhere? Stuff that we intentionally do not clean up.
Maybe this pattern is well-known in networking circles, and I just lack some common knowledge, but Docker does at least have a doc about it: https://docs.docker.com/engine/network/firewall-iptables/
There was a problem hiding this comment.
Sure, I can add some notes in the README updates. We have to leave them since tearing them down on shutdown would break networking to any tasks that were still running.
| fn(ipt.Delete(iptablesNATTableName, "OUTPUT", []string{"-j", outputIPTablesChainName}...)) | ||
| fn(ipt.ClearChain(iptablesNATTableName, outputIPTablesChainName)) | ||
| fn(ipt.DeleteChain(iptablesNATTableName, outputIPTablesChainName)) |
There was a problem hiding this comment.
Oof, if I run these tests on my machine alongside a running virt task, this helper sweeps these iptables out from under me. I suppose this is just an occupational hazard, and I should run these tests in a VM.
There was a problem hiding this comment.
Yeah, this makes it a bit tenuous running directly on the host. However, with the iptables check guarding the tests that actually interact with iptables, if you run as a non-elevated user it'll just result in the mocks running.
43ed5e9 to
b4bf7ae
Compare
Allow port forwards to the loopback device. Requires the host system to be properly configured to allow routing of localnet packets.
b4bf7ae to
31578e9
Compare
Allow port forwards to the loopback device. Requires the host system to be properly configured to allow routing of localnet packets.