Skip to content

Conversation

solsson
Copy link
Contributor

@solsson solsson commented Oct 16, 2017

As suggested in #73.

Causes warnings in logs, as before.

@elm-
Copy link
Contributor

elm- commented Oct 16, 2017

Just to be fair, to have the exact same thing the timeout should be 1, the netcat was waiting for 1 second as well (-w). Default behaviour on Kubernetes is to fail after 3 probes with a timeout of 1 second. For a TCP connect 1 second should suffice, but I'm no Kafka expert, so not sure what guarantees are typical here. Internally Kafka clients I see are working with timeouts of a minute before failing, so 1 second or even 10 maybe even to short.

@solsson
Copy link
Contributor Author

solsson commented Nov 7, 2017

Let's keep it at 10 then. If the pod's node is at 100% CPU due to other containers, it will only hurt cluster health to kill the broker. On the other hand we wouldn't want to be near client's limits. Ideally we're able to kill and restart before clients time out.

Actually, lets reduce the timeout to 9s. Interval is 10s. Zk logs warnings at TCP conversations that don't make sense, so it helps if probes are very regular.

and trust alarms on Under-replicated Partitions to let us know
when something is really wrong.

Do clients actually care about Readiness?

The bootstrap service (#52)
will definitely care, which is good.

The `broker` service, that the StatefulSet manifest depends on
(https://github.com/Yolean/kubernetes-kafka/blob/v2.1.0/50kafka.yml#L7)
for naming, is without `publishNotReadyAddresses`.

Clients will bootstrap, get the individual DNS names of brokers,
resolve those addresses and connect directly to pods.
@solsson
Copy link
Contributor Author

solsson commented Nov 7, 2017

For a minute there I was thinking we had liveness probes. I don't think we should, as with the current scope ops will have to manually respond to failure modes such as under-replicated partitions. If we can do Preferred Replica Election manually we can also kill pods manually.

@solsson solsson added this to the v3.0 milestone Nov 7, 2017
@solsson solsson merged commit 3867c84 into master Nov 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants