Vigil — Kubernetes-native uptime monitor

An uptime monitor + status page built as a Kubernetes application around an UptimeCheck custom resource and operator: checks are cluster objects, probe results live in their status, history flows into Postgres, and state changes fire signed webhooks. Packaged as a Helm chart, hardened with restricted Pod Security and default-deny NetworkPolicies.

$ kubectl get uptimechecks
NAME          URL                      STATE   CODE   LATENCY(MS)   SINCE
always-down   https://broken.invalid   down    0      3002          2026-06-06T22:41:07+00:00
example       https://example.com      up      200    89            2026-06-06T22:41:05+00:00
github        https://github.com       up      200    142           2026-06-06T22:41:04+00:00

Run it locally

make cluster-up    # k3d cluster, ingress mapped to localhost:8080
make build         # docker build + import into the cluster
make deploy        # namespace (kubectl, carries PSS labels) + helm install
make smoke         # 10-step verification
open http://localhost:8080

Requires: Docker, k3d, kubectl, helm. No AWS account needed for local dev.

Architecture

UptimeCheck CRs ──watched──→ operator (kopf) ──probe loop per check──→ targets
   ▲    │ status patched back      │
   │    ▼                          ├─→ Postgres (results, transitions)
 helm  api (FastAPI) ──reads──┐    └─→ webhook alert on state change (HMAC-signed)
values  │     └── uptime %, events ← Postgres
        ▼
ingress → status page (fetch-polling)

Operator pattern: create/edit/delete an UptimeCheck and the probe loop reconciles within seconds — no restarts, no config files. Transition detection is seeded from prior status, so a fix-the-URL edit still alerts.
Two ServiceAccounts, least privilege: the operator may watch/patch checks and their status; the api is read-only.
History is best-effort by design: Postgres down → monitoring continues, only uptime %/events suffer.
Hardening: restricted Pod Security (all pods non-root, read-only rootfs, no capabilities, seccomp), default-deny NetworkPolicies with five explicit allows (dns, ingress→api, api/operator→db, operator→probe targets), demo Postgres included with the same constraints.

Field notes (bugs this repo survived)

kubectl apply -f dir/ is alphabetical — the namespace raced resources into NotFound. Deploy order is now explicit.
Rollouts 502'd briefly: the ingress kept routing to the dying pod. Fixed with a preStop sleep; the smoke test demands a streak of 200s before trusting the ingress (round-robin can sneak one dying pod past a single probe).
Namespace deletion deadlocked on kopf's finalizers (operator was gone, so nothing removed them). Our delete cleanup is in-memory only, so the handler is now optional=True — no finalizer, deletions never block on Vigil.

Roadmap

~~ConfigMap-driven checker + status page on k3d~~
~~UptimeCheck CRD + kopf operator~~
~~Postgres history, uptime %, transitions + signed webhook alerts~~
~~Restricted PSS, default-deny NetworkPolicies, Helm chart~~
~~EKS via Terraform (public-subnet nodes — no NAT Gateway — IRSA, ALB, ECR)~~ — see docs/eks-runbook.md

Deploying to AWS (EKS)

The same chart runs on EKS — only the cluster and ingress class change. Full steps in docs/eks-runbook.md; in short:

make eks-up           # Terraform: VPC (no NAT) + EKS + ECR + ALB-controller IRSA + budget
make eks-kubeconfig
make eks-push         # image → ECR
# install the AWS Load Balancer Controller (runbook step 4), then:
helm upgrade --install vigil charts/vigil -f charts/vigil/values.yaml \
  -f charts/vigil/values-eks.yaml --set image.repository=$ECR -n vigil
make eks-down         # ALWAYS — EKS bills ~$0.50/session, ~$3/day if forgotten

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
charts/vigil		charts/vigil
docs		docs
k8s		k8s
scripts		scripts
src/vigil		src/vigil
terraform		terraform
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vigil — Kubernetes-native uptime monitor

Run it locally

Architecture

Field notes (bugs this repo survived)

Roadmap

Deploying to AWS (EKS)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vigil — Kubernetes-native uptime monitor

Run it locally

Architecture

Field notes (bugs this repo survived)

Roadmap

Deploying to AWS (EKS)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages