feat: add K8s SRE Incident Response dashboard by infrawithshobhit · Pull Request #376 · SigNoz/dashboards

infrawithshobhit · 2026-06-17T16:31:59Z

Dashboard: K8s SRE Incident Response

A Kubernetes dashboard built for on-call triage, Based on real life example

Q- Why this dashboard?
Most K8s dashboards show everything, during an active incident. That's called noise.
This one surfaces 8 signals that answer, what is broken right now, and why?

Panels included

Pod Restart Rate — early crash loop detection
Node CPU Pressure — scheduling risk threshold (warn: 80%, crit: 90%)
Node Memory Pressure — OOM risk (warn: 85%, crit: 95%)
Pending Pods — scheduling failure indicator
Container Restarts / OOMKills — memory leak signal
p99 Latency — primary SLI for SLO tracking
5XX Error Rate — error budget burn rate
PVC Storage Usage — proactive storage incident prevention

Data source

OpenTelemetry Collector with Kubernetes receiver + kubelet metrics.
APM panels require OTel SDK instrumentation.

Variables

k8s_cluster_name — cluster selector
k8s_namespace_name — multi-select namespace filter

feat: add K8s SRE Incident Response dashboard

a5adc1a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add K8s SRE Incident Response dashboard#376

feat: add K8s SRE Incident Response dashboard#376
infrawithshobhit wants to merge 1 commit into
SigNoz:mainfrom
infrawithshobhit:feat/k8s-sre-incident-response

infrawithshobhit commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

infrawithshobhit commented Jun 17, 2026

Panels included

Data source

Variables

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant