-
Notifications
You must be signed in to change notification settings - Fork 122
Description
Dear OpenCost maintainers,
since last week we noticed that our OpenShift cluster show a degradation warning, as only 50% of the apiservers are responding.
Turns out this seems to be related to metrics exposed by OpenCost, scraped by Prometheus and then returned by the query used for this degradation detection.
We have explictly disabled the emission of pod annotations, namespace annotations and ksm V1 metrics and the error vanished.
opencost:
metrics:
serviceMonitor:
enabled: true
kubeStateMetrics:
emitPodAnnotations: false
emitNamespaceAnnotations: false
emitKsmV1Metrics: false
The following lines appeared in the deployment:
- name: EMIT_POD_ANNOTATIONS_METRIC
value: 'false'
- name: EMIT_NAMESPACE_ANNOTATIONS_METRIC
value: 'false'
- name: EMIT_KSM_V1_METRICS
value: 'false'
I would like to see this added to the documentation that @mittal-ishaan was working on IIRC.
The query that went wrong was this:
count(kube_pod_labels{label_app="openshift-kube-apiserver", label_apiserver="true", namespace="openshift-kube-apiserver" })
Before we introduced the workaround described above, this returned 6 pods, while only three were really running. Hence the degradation warning as only 50% were working...
Kind Regards,
Johannes