-
Notifications
You must be signed in to change notification settings - Fork 292
fix(iroh): Also keep track of non-best path's validity (v0.35) #3398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: maint-0.35
Are you sure you want to change the base?
Conversation
…them if best addr is soon outdated
|
match self { | ||
Source::ReceivedPong => TRUST_UDP_ADDR_DURATION, | ||
// // TODO: Fix time | ||
// Source::BestCandidate => Duration::from_secs(60 * 60), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about the BestCandidate
situation, how is that handled now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BestCandidate
used to be set in NodeUdpPaths::assign_best_addr_from_candidates_if_empty
, which in my experience was only called on the receive side the very first time you'd send data from the magic socket.
For some reason it doesn't have a best address set at that point, so it instead picks one from all the PathState
s it has and assigns the best it can find with Source::BestCandidate
.
I... am really unsure why in that special case it should trust this candidate for one hour. This seems ridiculously high and totally out of proportion with TRUST_UDP_ADDR_DURATION otherwise.
In practice I don't think this is all to bad (even if the best addr never expires for an hour). In theory that allows an attacker to waste lots of resources on our end if they manage to spoof an address from the other side.
Or it can mean we'd try sending on an address that isn't actually reachable anymore (doesn't pong) for quite a while before we fall back to the relay, but usually these cases would also coincide with the addrs in the call me maybe changing, thus that'd clear the best addr as well.
Idk. All in all, very weird code. But writing this comment makes me want to jump back into iroh 0.35 code and see why this NodeUdpPaths::assign_best_addr_from_candidates_if_empty
function was needed at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason it doesn't have a best address set at that point, so it instead picks one from all the
PathState
s it has and assigns the best it can find withSource::BestCandidate
.
I think the idea was to avoid flip flopping, and keeping this stable, unless new information comes in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't flip flop though, at least not more often than every TRUST_UDP_ADDR_DURATION
, since once you validate a path for that duration, you don't pick another until it expires :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually that's incorrect (and we've observed more frequent flip-flopping), because we'll usually have multiple paths that hole-punch successfully, and they'll all turn Valid
at around the same time (and we'll choose the lowest latency ones).
Then they all expire at roughly the same time, at which point we flip between them back and forth a bit.
It's... really hard to fix this flip-flopping. (1) because both ends of the pipe will usually choose to switch the best address at around the same time (after TRUST_UDP_ADDR_DURATION) and (2) it's very hard to agree on "the same" path on both ends of the connection (which would be the most stable configuration) and (3) the latency of a path goes up the more it's used.
Ideally all of this becomes irrelevant once we integrate path validation with our QUIC stack. At that point there's no reason for both ends to "agree" on the path to send and receive on and it's should be rare to "accidentally" lose path validation packets.
Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/3398/docs/iroh/ Last updated: 2025-08-01T07:39:37Z |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Contains some changes I've long wanted 😄
The path validity seems to be a much better abstraction than best-addr was. If this has seen enough testing I don't think there's anything blocking?
confirmed_at: Instant, | ||
trust_until: Instant, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems these always have the same relation. Why store both?
This is mostly an observation, it doesn't bother met that much if you want to keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree.
There's some more "duplicated state" stored here with some fields in PongReply
.
I think refactoring this further, I'd clean up the PathValidity
struct a little more. At this point it's literally a combination of the fields in BestAddr
with another PongReply
.
Self { | ||
paths, | ||
best_addr, | ||
chosen_candidate: None, | ||
best_ipv4: best, // we only use ipv4 addrs in tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be worth a debug_assert!
perhaps? I wouldn't be surprised if I ended up writing tests with IPv6 at some point and it would catch me out. That probably works entirely fine - Quinn's test suite is entirely IPv6 and doesn't seem to cause issues anywhere.
…ith `PathValidity` (#3400) ## Description #3398 but now rebased on `main`. See its description for more information. There's only one small change: We don't need to think about `NodeMap::reset`, as that's removed in `main`. ## Breaking Changes None. ## Notes & open questions <!-- Any notes, remarks or open questions you have to make about the PR. --> ## Change checklist <!-- Remove any that are not relevant. --> - [x] Self-review.
c6b3906
to
8e4d739
Compare
Description
This is an alternative to #3396 but also refactors the code a bit more.
Instead of trying to switch to a different address when the best address becomes outdated soon, we instead track the path validity for all paths we have the same way.
This means the
BestAddr
struct and module completely disappears. Instead we replace it with aPathValidity
struct and module and track it insidePathState
.Then we implement the "stickyness" that the best address had (i.e. not recalculating the best address all the time and sticking to a path as long as it's valid) by tracking a
best: UdpSendAddr
and abest_ipv4: UdpSendAddr
inNodeUdpPaths
.This also tries to generally update the
NodeUdpPaths
when we receive data instead of when we try to send.This means that
NodeUdpPaths::get_send_addr
might be slightly outdated, but it'll snap back to the "correct" thing once we receive any pings or data or call me maybes etc.I think this is a much cleaner approach, albeit with a bigger diff.
Breaking Changes
There shouldn't be any breaking changes.
Notes & open questions
There's some stuff like clearing the best address when
NodeMap::reset
is called that I didn't implement. I'm not sure if there's any point to that code path, it seems to be only called onEndpoint::stop()
, so I don't know why we'd want to clear the state there.Maybe more? I'm curious about what you think.
Change checklist
quic-rpc
iroh-gossip
iroh-blobs
dumbpipe
sendme