Skip to content

Add timer connection timeouts and automatic re-authentication#403

Open
kezarjg wants to merge 2 commits intojeffpiazza:masterfrom
kezarjg:timer-connection-timeouts
Open

Add timer connection timeouts and automatic re-authentication#403
kezarjg wants to merge 2 commits intojeffpiazza:masterfrom
kezarjg:timer-connection-timeouts

Conversation

@kezarjg
Copy link

@kezarjg kezarjg commented Jan 19, 2026

Implement connection timeouts and retry delays to allow the timer to recover more quickly from server outages. Add automatic re-authentication when timer sessions expire to maintain connectivity without manual intervention.

Implement connection timeouts and retry delays to allow the timer to
recover more quickly from server outages. Add automatic re-authentication
when timer sessions expire to maintain connectivity without manual
intervention.
@jeffpiazza
Copy link
Owner

Thanks for this!

Have you had a server crash or other outage from which this automatic recovery was necessary? (I'm curious what the circumstances of that would have been.) Before looking more closely at the code, I'd like to understand better what problem you're trying to solve here. My first thought is that a server crash should be rare and is perhaps not worth adding special code for recovery.

(I've also occasionally had the experience that a customer of an expired hosted instance continues to run a timer client that repeatedly tries to log in, creating a lot of noise in the server log files. Making the client try even harder in that case isn't something I want to encourage.)

Do you have a way to test this code, even manually? (I.e., can you describe a sequence of steps that exercises the timeout recovery and reauthentication code?)

I don't know offhand what the default connection timeout values are, but I would have assumed they were fairly reasonable. Have you had a different experience?

The "notauthorized" failure is more about having the wrong permissions than anything else. (I don't want some conscripted volunteer on the check-in desk accidentally deleting all the racing groups, e.g.) Having the session code go into a loop trying to log in again is not usually the right response. Also, the authorization is cookie-based, so I would expect a successful login wouldn't need to be repeated after a temporary connection loss. Again, do you have a particular experience where the current behavior is inadequate?

Nit: I was always taught that camel case puts capitals on word boundaries. As "re" is not a word, "reauthenticate" would be preferred over "reAuthenticate".

@kezarjg
Copy link
Author

kezarjg commented Jan 20, 2026

Thanks for the thoughtful questions. The motivation here is coming from operational experience rather than a theoretical failure mode.

At the moment I host the website on a Docker Swarm that runs at a location remote from the derby itself. The timer runs headless at the venue and connects over the network to the server. In practice, that setup introduces a few realistic failure cases that I’ve actually encountered:

  • transient network interruptions at the venue
  • the Swarm rescheduling or restarting the server container on a different host

When any of those happen, the timer client can end up stuck waiting indefinitely for a connection that will never complete cleanly. Because the timer is headless, there’s no operator there to notice and restart it, which has resulted in loops where the race is effectively blocked until someone (me) intervenes.

I agree that true server crashes should be rare, and I’m not trying to optimize for pathological cases. This is more about graceful recovery from short-lived disconnects in a distributed setup, especially when the client is unattended.

Our derby is this weekend. After the event, I can spend more time defining repeatable test scenarios. As for testing: right now, testing has been manual and situational — for example:

  1. Start the timer client and connect it to the server
  2. Restart or reschedule the server container in the Swarm
  3. Observe that the timer no longer progresses without manual restart
  4. With this change, the timer eventually detects the failure and reconnects automatically

If you have a different approach you’d recommend for handling this kind of failure, I’d be very happy to explore that as well.

HttpURLConnection does not set default timeouts. In the JDK source, both connectTimeout and readTimeout are declared as int fields with no initializer, so they default to 0. The JavaDoc specifies that a timeout of 0 means an infinite timeout. You can demonstrate this by trying to connect to http://192.0.2.1/derbynet.

Source: https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/net/URLConnection.java

Totally fair on the camel case nit—thanks for flagging it. As for reauthenticate, that's just me. I'm happy to adjust if you have a preference. I tend to think of "re" like a prefix (or even hyphenated as "re-"), so my camel case brain naturally splits it out. But I’m flexible—happy to align with whatever convention we're using.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments