-
Notifications
You must be signed in to change notification settings - Fork 24
Fix sentinel reconnect loop due to unresponsive sentinel #76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…st in the endpoints list
|
@zuiderkwast Please have a look. |
zuiderkwast
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I understand the problem. I have some small suggestions.
To be honest, trap_exit is not the perfect way to handle processes. It is easy to miss something. It would be better to use a supervisor. Maybe we can do that in the future...
| %% Current sentinel connection broken | ||
| handle_info({'EXIT', _Pid, {connection_error, _}}, S) -> | ||
| {noreply, S}; | ||
| handle_info({'EXIT', Pid, _Reason}, #eredis_sentinel_state{conn_pid = Pid} = S) -> | ||
| {noreply, S#eredis_sentinel_state{conn_pid = undefined}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When connection fails with reason econnrefused, eredis:start_link returns {error, econnrefused} and at the same time, the eredis_client exits with reason {connection_error, econnrefused}. The 'EXIT' message waits in the inbox and query_master connects to another Redis Sentinel node and stores the successful pid in conn_pid. When handle_info is called, the conn_pid doesn't match.
Is this correct?
I think we need a better way to distinguish between this 'EXIT' message and another linked process exiting. Maybe it's better to do an explicit receive {'EXIT', _, {connection_error, _}} -> ok end after eredis:start_link returns an error, just before we try the next connection?
| handle_info({'EXIT', Pid, Reason}, S) -> | ||
| ?LOG_ERROR("eredis_sentinel: Exit from pid ~p with reason ~p and state ~p", [Pid, Reason, S]), | ||
| {stop, normal, S}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this always an error?
When eredis is started with sentinel options and later eredis is stopped with eredis:stop(Pid), this message is logged? It's not an error. Also I don't think we should log the state. It looks like debugging.
How about instead exit with the same reason as the linked process? Then an error will be logged automatically if Reason is anything other than normal.
| handle_info({'EXIT', Pid, Reason}, S) -> | |
| ?LOG_ERROR("eredis_sentinel: Exit from pid ~p with reason ~p and state ~p", [Pid, Reason, S]), | |
| {stop, normal, S}; | |
| handle_info({'EXIT', _Pid, Reason}, S) -> | |
| {stop, Reason, S}; |
| process_flag(trap_exit, false), | ||
| ?assertEqual(died, IsDead). | ||
|
|
||
| t_reconnect_success_on_sentinel_connection_break_mix_endpoints(Config) when is_list(Config) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some comments inside this test case to make it easier to understand what's happening.
Specifically handle connection errors for sentinel process to prevent termination.
Fixes #75