Commit 0b29edb
committed
dns-resolve: Do not treat never accessed responses as expired.
If a daemon has multiple remotes to connect to and it already reached
pretty high backoff interval on those connections, it is possible that
it will never be able to connect, if the DNS TTL value is too low.
For example, if ovn-controller has 3 remotes in the ovn-remote
configuration, where each is presented as a host name, the following
is happening when it reaches default max backoff of 8 seconds:
1. Tries to connect to the first remote - sends async DNS request.
2. Since jsonrpc/reconnect modules are not really aware of this, they
just treat connection as temporarily failed - 8 second backoff.
3. After the backoff - switching to a new remote - sending DNS request.
4. Temporarily failing - 8 second backoff.
5. Switching to the third remote - sending DNS request.
6. Temporarily failing - 8 second backoff.
7. Finally trying the first remote again - checking DNS.
8. There is a processed response, but it is already 24 seconds old.
9. If DNS TTL is lower than 24 seconds - consider expired - send
a new DNS request.
10. Go to step 2.
With that, if DNS TTL is lower than 3x of the backoff interval, the
process will never be able to connect without some external help to
break the loop.
A proper solution for this should include:
1. Making jsonrpc and reconnect and all the users of these modules
aware of the DNS request being made. This means introduction of
a new RECONNECT state for DNS request and not switching to a new
target while we're still in this state.
2. Making the poll loop state machine properly react to DNS responses
by waiting on the file descriptor provided by the unbound library.
However, such solution will be very invasive to the code structure
and all the involved libraries, so it may not be something that we
would want to backport as a bug fix to stable branches.
Instead, making a much simpler change to allow use of never previously
accessed DNS replies for a short period of time, so the loop can be
broken. It's not caching if we just requested the value, but didn't
use it yet, it's a "transaction in progress" situation in which we can
use the response even if TTL is zero. Without a proper solution though
we can't be sure that the process will ever look at the result of
asynchronous request, so we need to have an upper limit for such
"transactions in progress". Limiting them to a fairly arbitrary, but
big enough, value of 5 minutes. In the worst case where the address
actually goes stale in between our request and the first access, we'll
try to use the stale value once and then re-request right away on
failure to connect.
This solution seems reasonable and simple enough to backport to stable
branches while working on the proper solution on main.
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2025-June/053738.html
Acked-by: Eelco Chaudron <[email protected]>
Acked-by: Mike Pattrick <[email protected]>
Signed-off-by: Ilya Maximets <[email protected]>1 parent 5debc22 commit 0b29edb
1 file changed
+23
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| 31 | + | |
31 | 32 | | |
32 | 33 | | |
33 | 34 | | |
| |||
43 | 44 | | |
44 | 45 | | |
45 | 46 | | |
| 47 | + | |
46 | 48 | | |
47 | 49 | | |
48 | 50 | | |
| |||
220 | 222 | | |
221 | 223 | | |
222 | 224 | | |
| 225 | + | |
| 226 | + | |
223 | 227 | | |
224 | 228 | | |
225 | 229 | | |
226 | 230 | | |
227 | | - | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
228 | 248 | | |
229 | 249 | | |
230 | 250 | | |
231 | 251 | | |
232 | 252 | | |
233 | 253 | | |
234 | 254 | | |
235 | | - | |
| 255 | + | |
236 | 256 | | |
237 | 257 | | |
238 | 258 | | |
| |||
289 | 309 | | |
290 | 310 | | |
291 | 311 | | |
292 | | - | |
| 312 | + | |
293 | 313 | | |
294 | 314 | | |
295 | 315 | | |
| |||
0 commit comments