Skip to content

[Umbrella issue] How we monitor k8s-infra ?  #2588

@ameukam

Description

@ameukam

We initially had this conversation in #401.

Also kubernetes/test-infra#23317 (comment):

FYI @ameukam we don't have this feature enabled in kubernetes.io at the moment but will want to take a look at it soon

Some questions from thockin:

Cluster monitoring
a) What should we use?
GKE Workload metrics : https://cloud.google.com/stackdriver/docs/solutions/gke/managing-metrics#workload-metrics
Managed service for Prometheus : https://cloud.google.com/stackdriver/docs/managed-prometheus
b) How do we set it up with git-ops?
- #1376
- #1624
c) What exactly are we concerned about (signals)?
d) How are alerts delivered to a group of people?
e) How do we manage that group?
f) Do we need an on-call rotation?

App monitoring
a) Same tool as cluster monitoring?
b) What is the minimum expectation for an app to be deployed into community space
c) How do we manage groups of alerts for each app (ggroups?)
d) How do we manage on-call for each app?

GCP quotas monitoring
How do we monitoring them ?

More questions can be added.

/milestone v1.23
/are infra

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/infraInfrastructure management, infrastructure design, code in infra/lifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.priority/important-longtermImportant over the long term, but may not be staffed and/or may need multiple releases to complete.sig/k8s-infraCategorizes an issue or PR as relevant to SIG K8s Infra.

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions