Add timer connection timeouts and automatic re-authentication#403
Add timer connection timeouts and automatic re-authentication#403kezarjg wants to merge 2 commits intojeffpiazza:masterfrom
Conversation
Implement connection timeouts and retry delays to allow the timer to recover more quickly from server outages. Add automatic re-authentication when timer sessions expire to maintain connectivity without manual intervention.
|
Thanks for this! Have you had a server crash or other outage from which this automatic recovery was necessary? (I'm curious what the circumstances of that would have been.) Before looking more closely at the code, I'd like to understand better what problem you're trying to solve here. My first thought is that a server crash should be rare and is perhaps not worth adding special code for recovery. (I've also occasionally had the experience that a customer of an expired hosted instance continues to run a timer client that repeatedly tries to log in, creating a lot of noise in the server log files. Making the client try even harder in that case isn't something I want to encourage.) Do you have a way to test this code, even manually? (I.e., can you describe a sequence of steps that exercises the timeout recovery and reauthentication code?) I don't know offhand what the default connection timeout values are, but I would have assumed they were fairly reasonable. Have you had a different experience? The "notauthorized" failure is more about having the wrong permissions than anything else. (I don't want some conscripted volunteer on the check-in desk accidentally deleting all the racing groups, e.g.) Having the session code go into a loop trying to log in again is not usually the right response. Also, the authorization is cookie-based, so I would expect a successful login wouldn't need to be repeated after a temporary connection loss. Again, do you have a particular experience where the current behavior is inadequate? Nit: I was always taught that camel case puts capitals on word boundaries. As "re" is not a word, "reauthenticate" would be preferred over "reAuthenticate". |
|
Thanks for the thoughtful questions. The motivation here is coming from operational experience rather than a theoretical failure mode. At the moment I host the website on a Docker Swarm that runs at a location remote from the derby itself. The timer runs headless at the venue and connects over the network to the server. In practice, that setup introduces a few realistic failure cases that I’ve actually encountered:
When any of those happen, the timer client can end up stuck waiting indefinitely for a connection that will never complete cleanly. Because the timer is headless, there’s no operator there to notice and restart it, which has resulted in loops where the race is effectively blocked until someone (me) intervenes. I agree that true server crashes should be rare, and I’m not trying to optimize for pathological cases. This is more about graceful recovery from short-lived disconnects in a distributed setup, especially when the client is unattended. Our derby is this weekend. After the event, I can spend more time defining repeatable test scenarios. As for testing: right now, testing has been manual and situational — for example:
If you have a different approach you’d recommend for handling this kind of failure, I’d be very happy to explore that as well.
Source: https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/net/URLConnection.java Totally fair on the camel case nit—thanks for flagging it. As for reauthenticate, that's just me. I'm happy to adjust if you have a preference. I tend to think of "re" like a prefix (or even hyphenated as "re-"), so my camel case brain naturally splits it out. But I’m flexible—happy to align with whatever convention we're using. |
Implement connection timeouts and retry delays to allow the timer to recover more quickly from server outages. Add automatic re-authentication when timer sessions expire to maintain connectivity without manual intervention.