Hello,
it appears that the LAPI probes are not configurable, as they are hardcoded in the template without conditionals or values apart from the TLS setting:
|
livenessProbe: |
|
failureThreshold: 3 |
|
periodSeconds: 10 |
|
successThreshold: 1 |
|
timeoutSeconds: 5 |
|
httpGet: |
|
path: /health |
|
port: lapi |
|
{{- if .Values.tls.enabled }} |
|
scheme: HTTPS |
|
{{- else }} |
|
scheme: HTTP |
|
{{- end }} |
|
readinessProbe: |
|
failureThreshold: 3 |
|
periodSeconds: 10 |
|
successThreshold: 1 |
|
timeoutSeconds: 5 |
|
httpGet: |
|
path: /health |
|
port: lapi |
|
{{- if .Values.tls.enabled }} |
|
scheme: HTTPS |
|
{{- else }} |
|
scheme: HTTP |
|
{{- end }} |
|
startupProbe: |
|
failureThreshold: 30 |
|
periodSeconds: 10 |
|
successThreshold: 1 |
|
timeoutSeconds: 5 |
|
httpGet: |
|
path: /health |
|
port: lapi |
|
{{- if .Values.tls.enabled }} |
|
scheme: HTTPS |
|
{{- else }} |
|
scheme: HTTP |
|
{{- end }} |
The default configuration is also problematic because it doesn't give the time for the api to be considered unavailable and it gets killed immediately. It would be a better practice to have the threshold of the liveness probe higher than the readiness one: from the docs:
A common pattern for liveness probes is to use the same low-cost HTTP endpoint as for readiness probes, but with a higher failureThreshold. This ensures that the pod is observed as not-ready for some period of time before it is hard killed.
More in general, a liveness probe is not necessarily a good idea, unless an application is known to be prone to deadlock or some other unrecoverable state requiring a restart. Try to DDG for why is a livenessProbe a bad idea for some interesting reading.
In our use case, we need to disable the liveness probe because we are trying to investigate a network issue in the cluster. When the LAPI is killed, it stays for a long time in Terminating state (in the minutes - we will try look into why) and, because it uses ReadWriteOnce PVCs, it stops a new pod to start, causing an unnecessarily long downtime.
So, IMO, the LAPI probes should be configurable - as it is in the agent deployment by the way!
|
{{ if .Values.agent.livenessProbe }} |
|
livenessProbe: |
|
{{ toYaml .Values.agent.livenessProbe | indent 10 }} |
|
{{ end }} |
|
{{ if .Values.agent.readinessProbe }} |
|
readinessProbe: |
|
{{ toYaml .Values.agent.readinessProbe | indent 10 }} |
|
{{ end }} |
|
{{ if .Values.agent.startupProbe }} |
|
startupProbe: |
|
{{ toYaml .Values.agent.startupProbe | indent 10 }} |
|
{{ end }} |
|
{{ end }} |
If you are open to it I can try and propose a PR.
Thank you very much!
Hello,
it appears that the LAPI probes are not configurable, as they are hardcoded in the template without conditionals or values apart from the TLS setting:
helm-charts/charts/crowdsec/templates/lapi-deployment.yaml
Lines 119 to 157 in 7b556d5
The default configuration is also problematic because it doesn't give the time for the api to be considered unavailable and it gets killed immediately. It would be a better practice to have the threshold of the liveness probe higher than the readiness one: from the docs:
More in general, a liveness probe is not necessarily a good idea, unless an application is known to be prone to deadlock or some other unrecoverable state requiring a restart. Try to DDG for why is a livenessProbe a bad idea for some interesting reading.
In our use case, we need to disable the liveness probe because we are trying to investigate a network issue in the cluster. When the LAPI is killed, it stays for a long time in Terminating state (in the minutes - we will try look into why) and, because it uses ReadWriteOnce PVCs, it stops a new pod to start, causing an unnecessarily long downtime.
So, IMO, the LAPI probes should be configurable - as it is in the agent deployment by the way!
helm-charts/charts/crowdsec/templates/agent-deployment.yaml
Lines 162 to 174 in 7b556d5
If you are open to it I can try and propose a PR.
Thank you very much!