Skip to content

Commit 5bde5e3

Browse files
committed
feat: warn on probe failing
We have a fairly common occurence of KSM getting restarted due to its liveness probe failing. This is seemingly caused by rolling deploys of the backing API server, but we don't know exactly why. This should make it slightly more visible what kind of response the API server when a probe fails, which in turn could help either tweak probes to weather such situations where the API server may be unavailable or overwhelmed (it's entirely possible it's returning KSM a bunch of 429s) or if we have a larger problem at hand.
1 parent 807f8bd commit 5bde5e3

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

pkg/app/server.go

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -513,8 +513,10 @@ func buildTelemetryServer(registry prometheus.Gatherer, authFilter bool, kubeCon
513513

514514
func handleClusterDelegationForProber(client kubernetes.Interface, probeType string) http.HandlerFunc {
515515
return func(w http.ResponseWriter, _ *http.Request) {
516-
got := client.CoreV1().RESTClient().Get().AbsPath(probeType).Do(context.Background())
516+
var statusCode int
517+
got := client.CoreV1().RESTClient().Get().AbsPath(probeType).Do(context.Background()).StatusCode(&statusCode)
517518
if got.Error() != nil {
519+
klog.Warningf("Failed to contact API server for %s: got %d", probeType, statusCode)
518520
w.WriteHeader(http.StatusServiceUnavailable)
519521
w.Write([]byte(http.StatusText(http.StatusServiceUnavailable)))
520522
return

0 commit comments

Comments
 (0)