Skip to content

Conversation

@dhruvjain99
Copy link

Specifically handle connection errors for sentinel process to prevent termination.

Fixes #75

@dhruvjain99
Copy link
Author

@zuiderkwast Please have a look.

Copy link
Collaborator

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I understand the problem. I have some small suggestions.

To be honest, trap_exit is not the perfect way to handle processes. It is easy to miss something. It would be better to use a supervisor. Maybe we can do that in the future...

Comment on lines 115 to 119
%% Current sentinel connection broken
handle_info({'EXIT', _Pid, {connection_error, _}}, S) ->
{noreply, S};
handle_info({'EXIT', Pid, _Reason}, #eredis_sentinel_state{conn_pid = Pid} = S) ->
{noreply, S#eredis_sentinel_state{conn_pid = undefined}};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When connection fails with reason econnrefused, eredis:start_link returns {error, econnrefused} and at the same time, the eredis_client exits with reason {connection_error, econnrefused}. The 'EXIT' message waits in the inbox and query_master connects to another Redis Sentinel node and stores the successful pid in conn_pid. When handle_info is called, the conn_pid doesn't match.

Is this correct?

I think we need a better way to distinguish between this 'EXIT' message and another linked process exiting. Maybe it's better to do an explicit receive {'EXIT', _, {connection_error, _}} -> ok end after eredis:start_link returns an error, just before we try the next connection?

Comment on lines +120 to 122
handle_info({'EXIT', Pid, Reason}, S) ->
?LOG_ERROR("eredis_sentinel: Exit from pid ~p with reason ~p and state ~p", [Pid, Reason, S]),
{stop, normal, S};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this always an error?

When eredis is started with sentinel options and later eredis is stopped with eredis:stop(Pid), this message is logged? It's not an error. Also I don't think we should log the state. It looks like debugging.

How about instead exit with the same reason as the linked process? Then an error will be logged automatically if Reason is anything other than normal.

Suggested change
handle_info({'EXIT', Pid, Reason}, S) ->
?LOG_ERROR("eredis_sentinel: Exit from pid ~p with reason ~p and state ~p", [Pid, Reason, S]),
{stop, normal, S};
handle_info({'EXIT', _Pid, Reason}, S) ->
{stop, Reason, S};

process_flag(trap_exit, false),
?assertEqual(died, IsDead).

t_reconnect_success_on_sentinel_connection_break_mix_endpoints(Config) when is_list(Config) ->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some comments inside this test case to make it easier to understand what's happening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sentinel client gets stuck in reconnect loop

2 participants