[Bug] Pull Heartbeat handler intermittent failure after switch to scheduler #1345

scottf · 2025-07-01T17:25:35Z

The gist of the PR is that I think that the scheduled heartbeat task was still running causing essentially an infinite loop. This is due to the recent change where I switched out timers for scheduled tasks.

scottf · 2025-07-02T20:44:06Z

src/main/java/io/nats/client/impl/DispatcherFactory.java

+ * !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! *
+ * WARNING: THIS CLASS IS PUBLIC BUT ITS API IS NOT GUARANTEED TO *
+ * BE BACKWARD COMPATIBLE AS IT IS INTENDED AS AN INTERNAL CLASS  *
+ * !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! *


Just updating this comment to match the other one I added. No production code.

scottf · 2025-07-02T20:44:26Z

src/main/java/io/nats/client/impl/MessageManager.java

+    protected final AtomicBoolean hb;
+    protected final AtomicLong idleHeartbeatSettingMillis;
+    protected final AtomicLong alarmPeriodSettingNanos;
+    protected final AtomicReference<ScheduledTask> heartbeatTask;


Just made everything atomic since these value could change from different threads.

scottf · 2025-07-02T20:48:55Z

src/main/java/io/nats/client/impl/MessageManager.java

-                shutdownHeartbeatTimer();
+            ScheduledTask hbTask = heartbeatTask.get();
+            if (hbTask != null) {
+                hbTask.shutdown();


The gist of the PR is that I think that the scheduled heartbeat task was still running causing essentially an infinite loop. Some of the original code here was to reuse timer tasks because they were much heavier (had their own thread). So I had code to reuse them. But this is so much more light weight, just adding a task to the scheduler, so I simplified the code to just close the open task (shutdownHeartbeatTimer does that) and make a new one.

scottf · 2025-07-02T20:52:36Z

src/main/java/io/nats/client/impl/MessageManager.java

                () -> {
                    long sinceLast = NatsSystemClock.nanoTime() - lastMsgReceivedNanoTime.get();
-                    if (sinceLast > currentAlarmPeriodNanos.get()) {
+                    if (sinceLast > alarmPeriodSettingNanos.get()) {
+                        shutdownHeartbeatTimer(); // a new one will get started when needed.


The second part here is that when I get a heartbeat alarm, I make sure I shutdown the timer/task. I end up calling handleHeartbeatError (see next line of code) which is raises the error to the code that made the subscription to begin with (simplified consumer for instance). It's their problem to know what to do, for instance simplified will just try to make another sub.

scottf · 2025-07-02T20:54:48Z

src/main/java/io/nats/client/impl/NatsConnection.java

@@ -601,7 +601,7 @@ void tryToConnect(NatsUri cur, NatsUri resolved, long now) {

                if (pingMillis > 0) {
                    pingTask = new ScheduledTask(scheduledExecutor, pingMillis, () -> {
-                        if (isConnected()) {
+                        if (isConnected() && !isClosing()) {


When figuring this out I considered to see if I had to somehow make the runnable part of the task aware that it should stop, similar to to a keepGoing flag or handling an interrupt. I noticed that the tasks I care about are short lived. And then I noticed that there is no point in pinging if the connection is currently open but being closed so I changed this.

scottf · 2025-07-02T20:57:03Z

src/main/java/io/nats/client/support/ScheduledTask.java

 */
 public class ScheduledTask implements Runnable {
+    private static final AtomicLong ID_GENERATOR = new AtomicLong();


I added methods to support testing and make this a little bit more robust.

MauriceVanVeen

LGTM

scottf added 4 commits July 1, 2025 13:25

Debug unit test hanging 01-001

7ecea84

Debug unit test hanging 01-002

68d8f14

Debug unit test hanging 01-003

a22ba34

Debug unit test hanging 01-004

e25be23

scottf changed the title ~~Debug unit test hanging 01-001~~ Debug unit test hanging 01 Jul 1, 2025

scottf added 4 commits July 1, 2025 15:29

Debug unit test hanging 01-005

a2ecfab

Debug unit test hanging 01-006

e73a861

Debug unit test hanging 01-007

033a667

Debug unit test hanging 01-008

39599ca

scottf changed the title ~~Debug unit test hanging 01~~ [Bug] Pull Heartbeat handler intermittent failure after switch to scheduler Jul 2, 2025

scottf added 2 commits July 2, 2025 16:20

Fix heartbeat handling

05bf7cf

Fix heartbeat handling

d35ccb3

scottf requested a review from MauriceVanVeen July 2, 2025 20:43

remove debug

ba9fc53

scottf commented Jul 2, 2025

View reviewed changes

MauriceVanVeen approved these changes Jul 3, 2025

View reviewed changes

scottf merged commit 5d1da2c into main Jul 3, 2025
5 checks passed

scottf deleted the debug-unit-test-hanging-01 branch July 3, 2025 12:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Pull Heartbeat handler intermittent failure after switch to scheduler #1345

[Bug] Pull Heartbeat handler intermittent failure after switch to scheduler #1345

Uh oh!

scottf commented Jul 1, 2025 •

edited

Loading

Uh oh!

scottf Jul 2, 2025

Uh oh!

scottf Jul 2, 2025

Uh oh!

scottf Jul 2, 2025

Uh oh!

scottf Jul 2, 2025

Uh oh!

scottf Jul 2, 2025 •

edited

Loading

Uh oh!

scottf Jul 2, 2025

Uh oh!

MauriceVanVeen left a comment

Uh oh!

Uh oh!

Uh oh!

[Bug] Pull Heartbeat handler intermittent failure after switch to scheduler #1345

[Bug] Pull Heartbeat handler intermittent failure after switch to scheduler #1345

Uh oh!

Conversation

scottf commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

scottf Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

scottf Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

scottf Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

scottf Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

scottf Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scottf Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

MauriceVanVeen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

scottf commented Jul 1, 2025 •

edited

Loading

scottf Jul 2, 2025 •

edited

Loading