-
Notifications
You must be signed in to change notification settings - Fork 176
[Bug] Pull Heartbeat handler intermittent failure after switch to scheduler #1345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! * | ||
* WARNING: THIS CLASS IS PUBLIC BUT ITS API IS NOT GUARANTEED TO * | ||
* BE BACKWARD COMPATIBLE AS IT IS INTENDED AS AN INTERNAL CLASS * | ||
* !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just updating this comment to match the other one I added. No production code.
protected final AtomicBoolean hb; | ||
protected final AtomicLong idleHeartbeatSettingMillis; | ||
protected final AtomicLong alarmPeriodSettingNanos; | ||
protected final AtomicReference<ScheduledTask> heartbeatTask; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just made everything atomic since these value could change from different threads.
shutdownHeartbeatTimer(); | ||
ScheduledTask hbTask = heartbeatTask.get(); | ||
if (hbTask != null) { | ||
hbTask.shutdown(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The gist of the PR is that I think that the scheduled heartbeat task was still running causing essentially an infinite loop. Some of the original code here was to reuse timer tasks because they were much heavier (had their own thread). So I had code to reuse them. But this is so much more light weight, just adding a task to the scheduler, so I simplified the code to just close the open task (shutdownHeartbeatTimer does that) and make a new one.
() -> { | ||
long sinceLast = NatsSystemClock.nanoTime() - lastMsgReceivedNanoTime.get(); | ||
if (sinceLast > currentAlarmPeriodNanos.get()) { | ||
if (sinceLast > alarmPeriodSettingNanos.get()) { | ||
shutdownHeartbeatTimer(); // a new one will get started when needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second part here is that when I get a heartbeat alarm, I make sure I shutdown the timer/task. I end up calling handleHeartbeatError (see next line of code) which is raises the error to the code that made the subscription to begin with (simplified consumer for instance). It's their problem to know what to do, for instance simplified will just try to make another sub.
@@ -601,7 +601,7 @@ void tryToConnect(NatsUri cur, NatsUri resolved, long now) { | |||
|
|||
if (pingMillis > 0) { | |||
pingTask = new ScheduledTask(scheduledExecutor, pingMillis, () -> { | |||
if (isConnected()) { | |||
if (isConnected() && !isClosing()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When figuring this out I considered to see if I had to somehow make the runnable part of the task aware that it should stop, similar to to a keepGoing flag or handling an interrupt. I noticed that the tasks I care about are short lived. And then I noticed that there is no point in pinging if the connection is currently open but being closed so I changed this.
*/ | ||
public class ScheduledTask implements Runnable { | ||
private static final AtomicLong ID_GENERATOR = new AtomicLong(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added methods to support testing and make this a little bit more robust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The gist of the PR is that I think that the scheduled heartbeat task was still running causing essentially an infinite loop. This is due to the recent change where I switched out timers for scheduled tasks.