Skip to content

OpenCost metrics interfere with OpenShift's "degraded control plane" detection? #249

@kastl-ars

Description

@kastl-ars

Dear OpenCost maintainers,

since last week we noticed that our OpenShift cluster show a degradation warning, as only 50% of the apiservers are responding.

Turns out this seems to be related to metrics exposed by OpenCost, scraped by Prometheus and then returned by the query used for this degradation detection.

We have explictly disabled the emission of pod annotations, namespace annotations and ksm V1 metrics and the error vanished.

  opencost:
    metrics:
      serviceMonitor:
        enabled: true
      kubeStateMetrics:
        emitPodAnnotations: false
        emitNamespaceAnnotations: false
        emitKsmV1Metrics: false

The following lines appeared in the deployment:

        - name: EMIT_POD_ANNOTATIONS_METRIC
          value: 'false'
        - name: EMIT_NAMESPACE_ANNOTATIONS_METRIC
          value: 'false'
        - name: EMIT_KSM_V1_METRICS
          value: 'false'

I would like to see this added to the documentation that @mittal-ishaan was working on IIRC.

The query that went wrong was this:

count(kube_pod_labels{label_app="openshift-kube-apiserver", label_apiserver="true", namespace="openshift-kube-apiserver" })

Before we introduced the workaround described above, this returned 6 pods, while only three were really running. Hence the degradation warning as only 50% were working...

Kind Regards,
Johannes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions