diff --git a/CLAUDE.md b/CLAUDE.md index 082238d..1ae4bfe 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -23,7 +23,7 @@ There is no test suite. After changing `docker-compose.yml`, always run `docker **Network isolation matters.** Services are split across four bridge networks and one host-mode service. A service can only reach another if they share a network — adding a new service requires picking the right one (or declaring multiple): -- `monitoring_network` — tautulli, grafana, telegraf, watchtower, portainer +- `monitoring_network` — tautulli, grafana, telegraf, watchtower, portainer, prometheus, cadvisor, node-exporter - `media_network` — seerr, radarr, sonarr, prowlarr, bazarr - `download_network` — transmission, watchlistarr, cleanarr, requestrr, radarr, sonarr - `tracearr-network` — tracearr, timescale (PostgreSQL), redis @@ -37,7 +37,9 @@ Radarr and Sonarr are deliberately on both `media_network` (so Seerr, Prowlarr, **Volume paths are intentionally user-specific.** All bind mounts are rooted at `${USERDIR}` from `.env`. When advising the user, do not assume any particular host path layout — the README explicitly tells them to update paths to match their drive mounts. -**No Prometheus in the stack.** There is no `prometheus` service in `docker-compose.yml`. A starting-point config lives at `docs/prometheus.example.yml` for users who want to add Prometheus themselves — don't assume metrics are being scraped today. +**Prometheus is live, but only cAdvisor + node-exporter feed it.** `prometheus/prometheus.yml` is mounted into the prometheus container; only the `cadvisor` and `node_exporter` scrape jobs are active. The `telegraf` and `tautulli` jobs are commented out because those containers don't expose a `/metrics` endpoint by default — telegraf would need the `prometheus_client` output plugin enabled, and Tautulli needs a metrics plugin installed. Grafana points at Prometheus the same way it would point at any data source — configure it in the Grafana UI after first boot. + +**cAdvisor needs `privileged: true` and several host-fs mounts.** This is the standard cAdvisor pattern; flag it if a user reports security concerns about the monitoring stack. It also binds host port `8080` — if a user has something else on `8080`, that's the conflict. **Transmission uses `haugene/transmission-openvpn` and won't start without VPN credentials.** The container runs an OpenVPN client internally; `OPENVPN_PROVIDER`, `OPENVPN_CONFIG`, `OPENVPN_USERNAME`, and `OPENVPN_PASSWORD` must all be set in `.env`. The compose service declares `cap_add: NET_ADMIN` and `devices: /dev/net/tun` for the OpenVPN client; the data volume is `/data` (haugene's convention), not `/config` like the linuxserver image. `LOCAL_NETWORK` (CIDR, default `192.168.0.0/16`) controls which destinations bypass the tunnel — if a user reports the web UI is unreachable, this is almost always the cause. diff --git a/README.md b/README.md index 22eb576..1b55110 100644 --- a/README.md +++ b/README.md @@ -95,6 +95,10 @@ flowchart LR Telegraf -->|host + container metrics| Grafana Tautulli -->|usage data| Grafana + + cAdvisor -->|container metrics| Prometheus + NodeExporter[node-exporter] -->|host metrics| Prometheus + Prometheus -->|scraped time-series| Grafana ``` See [Network Architecture](#network-architecture) below for the exact network membership of each service. @@ -143,6 +147,9 @@ A ready-to-use [Kometa](https://kometa.wiki/) (Plex Meta Manager) configuration |---------|-------------|------| | [Tautulli](https://tautulli.com/) | Plex usage monitoring | `8181` | | [Grafana](https://grafana.com/) | Metrics visualization | `3000` | +| [Prometheus](https://prometheus.io/) | Time-series metrics database — scrapes cAdvisor + node-exporter | `9090` | +| [cAdvisor](https://github.com/google/cadvisor) | Per-container CPU / memory / network metrics | `8080` | +| [node-exporter](https://github.com/prometheus/node_exporter) | Host (CPU / disk / network) metrics | `9100` | | [Telegraf](https://www.influxdata.com/time-series-platform/telegraf/) | Metrics collection agent | N/A | | [Tracearr](https://github.com/connorgallopo/tracearr) | Stream tracking and account sharing detection | `3001` | | [Portainer](https://www.portainer.io/) | Docker management UI ([note on socket access](#a-note-on-portainer)) | `9000` | @@ -166,7 +173,7 @@ These services pair well with this stack but are not included in the default `do Services are isolated into separate Docker networks: -- **`monitoring_network`** - Tautulli, Grafana, Telegraf, Watchtower, Portainer +- **`monitoring_network`** - Tautulli, Grafana, Telegraf, Watchtower, Portainer, Prometheus, cAdvisor, node-exporter - **`media_network`** - Seerr, Radarr, Sonarr, Prowlarr, Bazarr - **`download_network`** - Transmission, Watchlistarr, Cleanarr, Requestrr, Radarr, Sonarr - **`tracearr-network`** - Tracearr, TimescaleDB, Redis diff --git a/docker-compose.yml b/docker-compose.yml index fd2f0e7..9192be4 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -83,6 +83,62 @@ services: - portainer_data:/data restart: unless-stopped + prometheus: + container_name: prometheus + image: prom/prometheus:latest + networks: + - monitoring_network + ports: + - "9090:9090" + environment: + - TZ=${TZ} + command: + - "--config.file=/etc/prometheus/prometheus.yml" + - "--storage.tsdb.path=/prometheus" + volumes: + - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro + - prometheus_data:/prometheus + restart: unless-stopped + + # Container resource metrics for Prometheus. Reads the Docker socket + # read-only and several host filesystems to enumerate running containers. + cadvisor: + container_name: cadvisor + image: gcr.io/cadvisor/cadvisor:latest + networks: + - monitoring_network + ports: + - "8080:8080" + volumes: + - /:/rootfs:ro + - /var/run:/var/run:ro + - /sys:/sys:ro + - /var/lib/docker/:/var/lib/docker:ro + - /dev/disk/:/dev/disk:ro + devices: + - /dev/kmsg + privileged: true + restart: unless-stopped + + # Host (kernel / disk / network) metrics for Prometheus. + node-exporter: + container_name: node-exporter + image: prom/node-exporter:latest + networks: + - monitoring_network + ports: + - "9100:9100" + command: + - "--path.procfs=/host/proc" + - "--path.sysfs=/host/sys" + - "--path.rootfs=/rootfs" + - "--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)" + volumes: + - /proc:/host/proc:ro + - /sys:/host/sys:ro + - /:/rootfs:ro + restart: unless-stopped + # ============ MEDIA MANAGEMENT ============ seerr: image: ghcr.io/seerr-team/seerr:latest @@ -318,5 +374,6 @@ volumes: prowlarr: bazarr: portainer_data: + prometheus_data: timescale_data: redis_data: diff --git a/docs/prometheus.example.yml b/docs/prometheus.example.yml deleted file mode 100644 index cb7623c..0000000 --- a/docs/prometheus.example.yml +++ /dev/null @@ -1,66 +0,0 @@ -# my global config -global: - scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. - evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. - # scrape_timeout is set to the global default (10s). - -# Alertmanager configuration -alerting: - alertmanagers: - - static_configs: - - targets: - # - alertmanager:9093 - -# Load rules once and periodically evaluate them according to the global 'evaluation_interval'. -rule_files: - # - "first_rules.yml" - # - "second_rules.yml" - -# A scrape configuration containing exactly one endpoint to scrape: -# Here it's Prometheus itself. -scrape_configs: - # The job name is added as a label `job=` to any timeseries scraped from this config. - - job_name: "prometheus" - # metrics_path defaults to '/metrics' - # scheme defaults to 'http'. - static_configs: - - targets: ["localhost:9090"] - - # Uncomment after adding cadvisor to docker-compose.yml - # # Docker metrics via cAdvisor - # - job_name: "docker" - # static_configs: - # - targets: ["cadvisor:8080"] - - # Uncomment after adding node-exporter to docker-compose.yml - # # Host metrics via Node Exporter - # - job_name: "node_exporter" - # static_configs: - # - targets: ["node-exporter:9100"] - - # Telegraf metrics - - job_name: "telegraf" - static_configs: - - targets: ["telegraf:9273"] - - # Tautulli metrics - - job_name: "tautulli" - metrics_path: "/metrics" - static_configs: - - targets: ["tautulli:8181"] - - # Uncomment after adding netdata to docker-compose.yml - # # Netdata metrics - # - job_name: "netdata" - # metrics_path: "/api/v1/allmetrics" - # params: - # format: [prometheus] - # honor_labels: true - # static_configs: - # - targets: ["netdata:19999"] - - # Uncomment after adding plex-exporter to docker-compose.yml - # # Plex Exporter metrics - # - job_name: "plex_exporter" - # static_configs: - # - targets: ["plex-exporter:9594"] \ No newline at end of file diff --git a/prometheus/prometheus.yml b/prometheus/prometheus.yml new file mode 100644 index 0000000..1ba6ec2 --- /dev/null +++ b/prometheus/prometheus.yml @@ -0,0 +1,33 @@ +global: + scrape_interval: 15s + evaluation_interval: 15s + +alerting: + alertmanagers: + - static_configs: + - targets: + # - alertmanager:9093 + +scrape_configs: + # Container resource metrics via cAdvisor + - job_name: "cadvisor" + static_configs: + - targets: ["cadvisor:8080"] + + # Host metrics (CPU, memory, disk, network) via node_exporter + - job_name: "node_exporter" + static_configs: + - targets: ["node-exporter:9100"] + + # Telegraf — uncomment after enabling the prometheus_client output plugin in + # telegraf.conf. The container doesn't expose :9273 by default. + # - job_name: "telegraf" + # static_configs: + # - targets: ["telegraf:9273"] + + # Tautulli — uncomment after installing a Prometheus metrics plugin in + # Tautulli. The base container does not expose /metrics natively. + # - job_name: "tautulli" + # metrics_path: "/metrics" + # static_configs: + # - targets: ["tautulli:8181"]