This repository manages the infrastructure for the SMC online-code-test platform.
SMC-Infra/
├── core/
│ └── traefik/ # Edge proxy + DNS-01 wildcard cert
│ ├── docker-compose.yaml
│ └── .env.example
├── services/
│ └── observe/ # Observability stack
│ ├── docker-compose.yaml # Prometheus, Alertmanager, Grafana, Loki, Tempo, Blackbox, Node-exporter
│ ├── config/ # Prometheus / Loki / Tempo configs, alert rules, blackbox targets
│ ├── dashboards/ # Grafana dashboards
│ ├── dashboards.yaml
│ ├── datasources.yaml
│ └── scripts/ # Rule + target generators, hot-reload
├── scripts/
│ ├── core-manager.sh
│ ├── smoke-watchdog.sh # Smoke test for Watchdog → Healthchecks
│ └── utils.sh
└── Makefile
make helpcore-up Start Traefik (creates smc-traefik network if missing)
core-down Stop Traefik
core-restart Restart Traefik (stop + start)
logs Tail Traefik logs
check Check whether the Traefik container is running
network-init Create the shared smc-traefik network if it does not already exist
# 1. Configure
cd core/traefik
cp .env.example .env
# fill in the values referenced in .env.example
# 2. Start
cd ../..
make core-up
make logsTraefik mints a single wildcard cert covering ${DOMAIN} and *.${DOMAIN} (SANs are declared on the dashboard router via tls.domains). Every sibling service — cd., exam., api., grafana., ephemeral pr-N., … — reuses that cert as soon as a container with matching Traefik labels joins smc-traefik. No per-subdomain DNS-01 dance is required.
Observability stack including Prometheus, Alertmanager, Grafana, Loki, Tempo, Blackbox exporter, and Node-exporter. Services stay on the internal network.
cd services/observe
cp .env.default .env # then edit
docker compose up -dHighlights:
- Alert rules for endpoint availability / latency / status-code, SSL-cert expiry, and Node-exporter (Hardware).
- Blackbox targets configured under
config/prometheus/targets/. - Alertmanager routes by severity to Discord (critical + default); webhook URLs live under
config/prometheus/secrets/discord/ - Watchdog — an always-firing Prometheus alert (
vector(1)) routed through a dedicatedhealthchecksreceiver to Healthchecks.io every 15 m. If pings stop arriving, Healthchecks notifies us that the monitoring pipeline itself is down (dead-man's switch). - Grafana dashboards for Prometheus, Alertmanager, Blackbox, and the Node-exporter are provisioned automatically.
See core/traefik/.env.example for the variables to set in core/traefik/.env.
Prerequisite: a wildcard A-record *.${DOMAIN} → <host IP> exists in Cloudflare
-
In the target service's
docker-compose.yaml, join thesmc-traefiknetwork (declaredexternal: true) and add labels:labels: - "traefik.enable=true" - "traefik.docker.network=smc-traefik" - "traefik.http.routers.<name>.rule=Host(`<name>.${DOMAIN}`)" - "traefik.http.routers.<name>.entrypoints=websecure" - "traefik.http.routers.<name>.tls.certresolver=cloudflare" - "traefik.http.services.<name>.loadbalancer.server.port=<container-port>"
-
Bring the service up. Traefik discovers the labels and reuses the existing wildcard cert.
There are two possible paths depending on whether the DNS record is proxied.
browser --TLS--> Traefik origin
(origin cert: Let's Encrypt in acme.json)
browser --TLS--> Cloudflare edge ---------TLS---> Traefik origin
(edge cert: CF Universal SSL) (origin cert: Let's Encrypt in acme.json)
Traefik must be running before starting any sibling service that relies on reverse-proxy rules. Without it, routing and TLS termination will not function.