diff --git a/.claude/.gitignore b/.claude/.gitignore
new file mode 100644
index 0000000..93c0f73
--- /dev/null
+++ b/.claude/.gitignore
@@ -0,0 +1 @@
+settings.local.json
diff --git a/.claude/README.md b/.claude/README.md
new file mode 100644
index 0000000..c531d3f
--- /dev/null
+++ b/.claude/README.md
@@ -0,0 +1,46 @@
+# .claude/ — Project Configuration
+
+## Commands
+
+| Command | Description |
+|---|---|
+| `/pm` | Project manager mode — discuss, plan, delegate |
+| `/impl <feature>` | Implement a feature from specs |
+| `/test [packages]` | Run tests with race detector |
+| `/check` | Full CI pipeline (build, vet, fmt, lint, test) |
+
+## Agents
+
+| Agent | Role |
+|---|---|
+| `spec-writer` | Write/update specs from a brief |
+| `developer` | Implement Go code from specs |
+| `code-reviewer` | Review code quality and idioms |
+| `spec-checker` | Verify implementation matches specs |
+
+## Structure
+
+```
+.claude/
+├── CLAUDE.md              # Project instructions (always loaded)
+├── settings.json          # Permissions, env vars
+├── rules/                 # Auto-loaded by file type
+│   ├── go-style.md        #   *.go → naming, errors, concurrency
+│   ├── architecture.md    #   internal/** → packages, interfaces, DI
+│   ├── testing.md         #   *_test.go → table-driven, mocking
+│   └── security.md        #   *.go, *.yaml → secrets, exec safety
+├── skills/                # Loaded on demand
+│   ├── pm/                #   Project manager orchestrator
+│   ├── go-expert/         #   Go patterns (oklog/run, slog, exec...)
+│   └── scaleset-sdk/      #   actions/scaleset SDK reference
+├── agents/                # Specialized subagents
+│   ├── developer.md
+│   ├── spec-writer.md
+│   ├── code-reviewer.md
+│   └── spec-checker.md
+└── commands/              # Slash commands
+    ├── pm.md
+    ├── impl.md
+    ├── test.md
+    └── check.md
+```
diff --git a/.claude/agents/code-reviewer.md b/.claude/agents/code-reviewer.md
new file mode 100644
index 0000000..4c6a0ab
--- /dev/null
+++ b/.claude/agents/code-reviewer.md
@@ -0,0 +1,24 @@
+---
+name: code-reviewer
+description: Review Go code for correctness, idioms, error handling, concurrency safety, and alignment with project specs.
+model: sonnet
+effort: 3
+allowedTools:
+  - Read
+  - Grep
+  - Glob
+---
+
+# Go Code Reviewer
+
+You review Go code in the ghr project. Focus on:
+
+1. **Correctness**: Does the code do what the spec says? Check against `specs/` files.
+2. **Error handling**: Every error wrapped with context? No ignored errors? Sentinel errors used correctly?
+3. **Concurrency**: Mutex used correctly? No data races? Context propagation complete? Goroutines have shutdown paths?
+4. **Interfaces**: Consumer-side only? Minimal (1-3 methods)? No getter interfaces?
+5. **Go idioms**: Naming follows Go conventions? No Java patterns? Structs with exported fields?
+6. **Security**: No hardcoded secrets? No unsanitized exec input? Permissions checked?
+7. **Tests**: Coverage of error paths? Table-driven? Race detector compatible?
+
+Be specific. Reference line numbers. Suggest concrete fixes, not vague improvements.
diff --git a/.claude/agents/developer.md b/.claude/agents/developer.md
new file mode 100644
index 0000000..9063186
--- /dev/null
+++ b/.claude/agents/developer.md
@@ -0,0 +1,64 @@
+---
+name: developer
+description: Implement Go code for ghr v2. Receives precise instructions from the PM with spec references, files to create/modify, and expected behavior. Writes production-quality Go code with tests.
+model: opus
+effort: 3
+allowedTools:
+  - Read
+  - Write
+  - Edit
+  - Bash
+  - Grep
+  - Glob
+---
+
+# Developer
+
+You are a senior Go developer implementing features for the ghr v2 project.
+
+## Input
+
+You receive **implementation instructions** from the PM containing:
+- Which spec(s) to follow (read them first)
+- Which files to create or modify
+- Expected behavior and edge cases
+- Dependencies on other packages
+
+## Process
+
+1. **Read the spec** — understand exactly what's expected
+2. **Read the architecture** — `specs/00-architecture.md` for package placement and patterns
+3. **Read existing code** — understand what's already implemented, import conventions
+4. **Implement** — write the code, following the spec precisely
+5. **Write tests** — alongside the implementation, not after
+6. **Verify** — `go build ./cmd/ghr` and `go test -race ./...`
+7. **Report** — list what was created/modified and any deviations from spec
+
+## Code standards
+
+- Package-by-feature under `internal/`
+- Consumer-side interfaces (defined where consumed, unexported, minimal)
+- Structs with exported fields (no getter interfaces)
+- Error wrapping: `fmt.Errorf("context: %w", err)`
+- `context.Context` as first parameter
+- Table-driven tests with `t.Run`
+- `oklog/run` for top-level actors, internal retry for per-group goroutines
+- Secrets via env vars, never hardcoded
+- No `any` without justification
+- No ignored errors with `_`
+
+## What you do NOT do
+
+- You don't decide architecture — that's in the specs
+- You don't add features not in the spec — flag them to the PM
+- You don't skip tests — every exported function gets tested
+- You don't skip error handling — every error is wrapped and returned
+- You don't use global state — everything via dependency injection
+
+## When something is unclear
+
+If the spec is ambiguous or you find a contradiction:
+1. State what's unclear
+2. State the two (or more) interpretations
+3. State which you'd pick and why
+4. Implement your pick but flag it in your report
diff --git a/.claude/agents/spec-checker.md b/.claude/agents/spec-checker.md
new file mode 100644
index 0000000..708f95f
--- /dev/null
+++ b/.claude/agents/spec-checker.md
@@ -0,0 +1,31 @@
+---
+name: spec-checker
+description: Verify that implementation matches the project specs. Use when implementing a new feature to ensure nothing is missed.
+model: sonnet
+effort: 3
+allowedTools:
+  - Read
+  - Grep
+  - Glob
+---
+
+# Spec Compliance Checker
+
+You verify that Go code matches the specs in `specs/`. For a given feature:
+
+1. Read the relevant spec file(s) from `specs/`
+2. Read the implementation code
+3. Compare point by point:
+   - Are all specified behaviors implemented?
+   - Are all edge cases handled as described?
+   - Do struct fields match the spec?
+   - Do function signatures match?
+   - Are config defaults correct?
+   - Are error messages as specified?
+4. Report:
+   - Implemented correctly
+   - Missing from implementation
+   - Deviations from spec (with reasoning if the deviation seems intentional)
+   - Spec ambiguities discovered during review
+
+Be thorough. Cross-reference between specs (e.g., spec 01 references spec 08 for auth).
diff --git a/.claude/agents/spec-writer.md b/.claude/agents/spec-writer.md
new file mode 100644
index 0000000..34a3c83
--- /dev/null
+++ b/.claude/agents/spec-writer.md
@@ -0,0 +1,66 @@
+---
+name: spec-writer
+description: Write detailed technical specs for ghr v2 features. Receives a brief from the PM, reads existing specs for context, and produces a complete spec document.
+model: opus
+effort: 3
+allowedTools:
+  - Read
+  - Write
+  - Edit
+  - Grep
+  - Glob
+---
+
+# Spec Writer
+
+You write technical specifications for the ghr v2 project.
+
+## Input
+
+You receive a **brief** from the PM that describes what needs to be specified. The brief contains:
+- What the feature does
+- User's requirements and decisions
+- Related existing specs to reference
+- Any constraints or non-goals
+
+## Process
+
+1. **Read existing specs** for context (especially `specs/00-architecture.md`)
+2. **Read related specs** mentioned in the brief
+3. **Write the spec** following the established format
+
+## Spec format
+
+Follow the same structure as existing specs in `specs/`:
+
+```markdown
+# Spec XX — Title
+
+## Overview
+1-2 sentences describing the feature.
+
+---
+
+## [Feature sections]
+Detailed description with:
+- Go code examples (structs, interfaces, function signatures)
+- Config YAML examples
+- Flow descriptions (startup, shutdown, error handling)
+- Decision rationale (why this approach)
+
+## Config schema
+Relevant YAML fields for this feature.
+
+## Integration points
+How this feature connects to other specs/packages.
+```
+
+## Rules
+
+- Be specific — include Go signatures, YAML examples, concrete values
+- Reference other specs by number (e.g., "see spec 08-auth.md")
+- Use the same terminology as existing specs
+- Flag any contradictions with existing specs
+- Don't over-specify implementation details that should be left to the developer
+- Config secrets via env vars only, never in YAML
+- Follow the architecture from spec 00 (package-by-feature, consumer-side interfaces)
diff --git a/.claude/commands/check.md b/.claude/commands/check.md
new file mode 100644
index 0000000..85dd54d
--- /dev/null
+++ b/.claude/commands/check.md
@@ -0,0 +1,16 @@
+---
+description: Run full CI checks locally (build, vet, fmt, lint, test)
+allowed-tools: Bash(go *) Bash(golangci-lint *)
+---
+
+# Full CI Check
+
+Run the complete check pipeline:
+
+1. `go build ./cmd/ghr` — must compile
+2. `go vet ./...` — static analysis
+3. Check formatting: `gofmt -l .` — must return empty (all formatted)
+4. `golangci-lint run` — lint (config: `.golangci.yml`)
+5. `go test -race ./...` — all tests with race detector
+
+Report pass/fail for each step. Stop on first failure.
diff --git a/.claude/commands/impl.md b/.claude/commands/impl.md
new file mode 100644
index 0000000..c3a2426
--- /dev/null
+++ b/.claude/commands/impl.md
@@ -0,0 +1,17 @@
+---
+description: Implement a feature from a spec. Reads the spec, plans, implements, tests.
+---
+
+# Implement from Spec
+
+Implement $ARGUMENTS following this workflow:
+
+1. **Read the spec**: Find the relevant spec in `specs/` for the requested feature
+2. **Read architecture**: Check `specs/00-architecture.md` for package placement and interfaces
+3. **Plan**: List the files to create/modify, the structs, interfaces, and functions needed
+4. **Implement**: Write the code following the spec precisely
+5. **Test**: Write tests alongside the implementation
+6. **Verify**: Run `go build ./cmd/ghr` and `go test -race ./...`
+7. **Review**: Check against the spec for any missed items
+
+If the spec is ambiguous or contradicts another spec, flag it before implementing.
diff --git a/.claude/commands/pm.md b/.claude/commands/pm.md
new file mode 100644
index 0000000..d558193
--- /dev/null
+++ b/.claude/commands/pm.md
@@ -0,0 +1,21 @@
+---
+description: Start the project manager mode. Discuss features, create specs, plan and delegate implementation.
+---
+
+# Project Manager Mode
+
+You are now the **project manager** for ghr v2. Read the PM skill at `.claude/skills/pm/SKILL.md` for your full instructions.
+
+Before anything else:
+1. Read `specs/00-architecture.md` to understand the current architecture
+2. Check what exists in `internal/` to know the project state
+3. Greet the user and ask what they want to work on
+
+Task from user: $ARGUMENTS
+
+If no arguments, ask what they want to work on. Options:
+- Discuss a feature or idea
+- Create a new spec
+- Implement a feature from an existing spec
+- Review project status
+- Something else
diff --git a/.claude/commands/test.md b/.claude/commands/test.md
new file mode 100644
index 0000000..1c1c227
--- /dev/null
+++ b/.claude/commands/test.md
@@ -0,0 +1,16 @@
+---
+description: Run tests with race detector and show results
+allowed-tools: Bash(go test *)
+---
+
+# Run Tests
+
+Run tests for $ARGUMENTS (default: all packages):
+
+```bash
+go test -race -v $ARGUMENTS
+```
+
+If no arguments: `go test -race ./...`
+
+After tests complete, summarize: passed/failed/skipped counts and any failures.
diff --git a/.claude/rules/architecture.md b/.claude/rules/architecture.md
new file mode 100644
index 0000000..dbb5ad9
--- /dev/null
+++ b/.claude/rules/architecture.md
@@ -0,0 +1,39 @@
+---
+paths:
+  - "internal/**/*.go"
+  - "cmd/**/*.go"
+---
+
+# Architecture Rules
+
+## Package structure
+- Package-by-feature under `internal/`, one level deep. No `domain/`, `app/`, `infra/` layers.
+- `internal/model/` contains ONLY shared data structs and enums. No interfaces. No logic. Under 100 LOC.
+- Each package owns its feature end-to-end.
+
+## Interfaces
+- Define interfaces where they are CONSUMED, not where they are implemented.
+- Consumer-side interfaces are unexported (lowercase) and minimal (1-3 methods).
+- Never create a central `ports.go` or `interfaces.go`.
+- Never create getter interfaces (`ID() string`, `Name() string`). Use struct fields.
+
+## Dependencies
+- Dependency injection is manual in `cmd/ghr/main.go`. No DI framework.
+- The `controller/` package defines what it needs from `github/` via a small interface.
+- The `health/` package defines what it needs from `controller/` via a small interface.
+- Import direction: `cli` → `controller` → `github`, `runner`, `notification`. Never the reverse.
+
+## Concurrency
+- `oklog/run.Group` for the top-level daemon actors (controller, health, API server, signal handler).
+- When ONE actor fails, ALL are interrupted — clean deterministic shutdown.
+- Per-group goroutines are managed INSIDE the controller with their own retry logic.
+- A single group failure does NOT kill other groups.
+
+## Configuration
+- All config values come from the config struct. No global variables.
+- Secrets via env vars only, never in YAML.
+- Auth credentials via `ghr login` / credentials file, not config.
+
+## Specs
+- Before implementing a feature, read the corresponding spec in `specs/`.
+- If the spec is unclear or you need to deviate, flag it rather than guessing.
diff --git a/.claude/rules/code-cleanliness.md b/.claude/rules/code-cleanliness.md
new file mode 100644
index 0000000..7747f32
--- /dev/null
+++ b/.claude/rules/code-cleanliness.md
@@ -0,0 +1,22 @@
+# Code Cleanliness
+
+## Comments
+- No comments in code (Exception: explain the why). Code must be self-documenting through clear naming.
+- No godoc comments on types, functions, or methods. Names speak for themselves.
+- No inline comments, no section separators (--- lines), no TODO markers.
+- No commented-out code.
+- Exception: required `//go:` directives and `//nolint:` directives.
+
+## File size
+- Source files must stay under 200 LOC (excluding tests).
+- If a file grows beyond 200 LOC, split by logical concern into separate files.
+- One responsibility per file. Name files after what they contain.
+
+## Structure
+- Use subdirectories when a package has more than 5-6 files with distinct concerns.
+- Test files are exempt from the 200 LOC limit but should still be well-organized.
+- Group related types/functions in the same file. Don't scatter a concept across files.
+
+## Naming
+- File names describe their content: `handler.go`, `writer.go`, `validate.go`.
+- No generic names: `utils.go`, `helpers.go`, `common.go`, `misc.go`.
diff --git a/.claude/rules/go-style.md b/.claude/rules/go-style.md
new file mode 100644
index 0000000..3099f59
--- /dev/null
+++ b/.claude/rules/go-style.md
@@ -0,0 +1,47 @@
+---
+paths:
+  - "**/*.go"
+---
+
+# Go Style & Idioms
+
+## Naming
+- Package names: short, lowercase, singular (`runner` not `runners`, `config` not `configuration`)
+- Exported names: PascalCase, meaningful without package prefix (`runner.Process` not `runner.RunnerProcess`)
+- Unexported: camelCase
+- Acronyms: all caps (`ID`, `HTTP`, `URL`, `API`, `PID`, `JIT`)
+- Interface names: verb-er for single-method (`io.Reader`), descriptive for multi-method
+
+## Error handling
+- Always wrap with context: `fmt.Errorf("start runner %s: %w", name, err)`
+- Never ignore errors with `_` — handle or log explicitly
+- Use sentinel errors (`var ErrNotFound = errors.New(...)`) for expected conditions
+- Use `errors.Is` / `errors.As` for checking, never string comparison
+- Return early on error (no deep nesting)
+
+## Functions
+- `context.Context` always first parameter
+- Return concrete types, accept interfaces
+- Keep functions short (< 40 lines guideline)
+- Prefer named return values only when it aids godoc clarity
+
+## Concurrency
+- Protect shared state with `sync.Mutex` (not channels for simple state)
+- Always use `context.Context` for cancellation
+- Never start a goroutine without a way to stop it
+- Use `oklog/run` for top-level actor management
+- Use `sync.WaitGroup` or `errgroup` for worker pools
+
+## Testing
+- Table-driven with `t.Run` subtests
+- Test file in same package (white-box) or `_test` package (black-box)
+- Use `testify/assert` or `testify/require` for assertions
+- Use `httptest.Server` for HTTP tests
+- Test names: `TestFunctionName_Scenario_Expected`
+- Race detector: always run with `-race` in CI
+
+## Packages
+- Everything under `internal/` (nothing exported outside module)
+- One feature per package, no `utils/` or `helpers/`
+- Avoid circular imports — if needed, extract shared types to `model/`
+- Package-level `var` and `init()` only for simple defaults, never for complex setup
diff --git a/.claude/rules/security.md b/.claude/rules/security.md
new file mode 100644
index 0000000..22dfd46
--- /dev/null
+++ b/.claude/rules/security.md
@@ -0,0 +1,19 @@
+---
+paths:
+  - "**/*.go"
+  - "**/*.yaml"
+  - "**/*.json"
+---
+
+# Security Rules
+
+- Never hardcode secrets (tokens, keys, passwords). Use env vars or the credentials file.
+- Never log secrets. PATs are masked (`ghp_xxxx...xxxx`), JIT configs are never logged.
+- JIT configs (`EncodedJITConfig`) are secrets — treat as such until consumed by the runner.
+- Credentials file: `0600` permissions. Warn if overly permissive.
+- Private key paths: verify `0600` permissions at login time.
+- Webhook URLs (Discord, etc.): via env vars only, never in config.yaml.
+- Never `exec.Command` with unsanitized user input.
+- Never `filepath.Join` with untrusted path components (path traversal).
+- TLS: do not skip verification by default. Support custom CAs via config if needed.
+- Validate all external input (config values, API responses, env vars).
diff --git a/.claude/rules/testing.md b/.claude/rules/testing.md
new file mode 100644
index 0000000..fabfa2f
--- /dev/null
+++ b/.claude/rules/testing.md
@@ -0,0 +1,43 @@
+---
+paths:
+  - "**/*_test.go"
+---
+
+# Testing Rules
+
+## Structure
+- One test file per source file: `foo.go` → `foo_test.go`
+- Table-driven tests with `t.Run` for every non-trivial function
+- Group related tests in subtests: `TestGroupController/startup`, `TestGroupController/shutdown`
+
+## Naming
+- `TestFunctionName` for basic tests
+- `TestFunctionName_Scenario` for specific scenarios
+- `TestFunctionName_Scenario_Expected` for full clarity
+- Benchmark: `BenchmarkFunctionName`
+
+## Assertions
+- Use `testify/require` for fatal checks (stop test on failure)
+- Use `testify/assert` for non-fatal checks (continue test)
+- Never use bare `if err != nil { t.Fatal(err) }` when testify is available
+
+## Mocking
+- Consumer-side interfaces make mocking trivial
+- Hand-written fakes preferred over generated mocks for simple interfaces
+- Use `httptest.Server` for HTTP integration tests
+- Use the scaleset SDK's `internal/testserver` pattern for GitHub API mocks
+
+## Coverage
+- Run with `-race` flag always
+- Focus on behavior, not coverage percentage
+- Test error paths, not just happy paths
+- Timeouts in tests: use `context.WithTimeout` or `time.After`, never bare `time.Sleep`
+
+## What to test
+- `model/` — no tests needed (pure data)
+- `controller/` — mock github client + runner backend
+- `runner/` — test binary download with httptest, process lifecycle with real exec
+- `health/` — mock runner state + reporter interfaces
+- `notification/` — test providers against httptest.Server
+- `config/` — table-driven validation
+- `cli/` — thin layer, minimal tests
diff --git a/.claude/settings.json b/.claude/settings.json
new file mode 100644
index 0000000..dd1467c
--- /dev/null
+++ b/.claude/settings.json
@@ -0,0 +1,45 @@
+{
+  "permissions": {
+    "allow": [
+      "Bash(go build *)",
+      "Bash(go test *)",
+      "Bash(go fmt *)",
+      "Bash(go vet *)",
+      "Bash(go mod *)",
+      "Bash(go run *)",
+      "Bash(gofmt *)",
+      "Bash(golangci-lint *)",
+      "Bash(git status)",
+      "Bash(git diff *)",
+      "Bash(git log *)",
+      "Bash(git branch *)",
+      "Bash(git show *)",
+      "Bash(ls *)",
+      "Bash(find *)",
+      "Bash(wc *)",
+      "Bash(head *)",
+      "Bash(tail *)",
+      "Bash(cat go.mod)",
+      "Bash(cat go.sum)",
+      "Bash(mkdir -p *)",
+      "Read",
+      "Edit",
+      "Write",
+      "Grep",
+      "Glob"
+    ],
+    "deny": [
+      "Bash(rm -rf /)",
+      "Bash(sudo *)",
+      "Read(.env)",
+      "Read(**/.env)",
+      "Read(**/credentials.json)",
+      "Edit(.env)",
+      "Edit(**/credentials.json)"
+    ]
+  },
+  "env": {
+    "GOPROXY": "https://proxy.golang.org",
+    "CGO_ENABLED": "1"
+  }
+}
diff --git a/.claude/skills/go-expert/SKILL.md b/.claude/skills/go-expert/SKILL.md
new file mode 100644
index 0000000..b26dbae
--- /dev/null
+++ b/.claude/skills/go-expert/SKILL.md
@@ -0,0 +1,74 @@
+---
+name: go-expert
+description: Advanced Go patterns and best practices for daemon/service projects. Use when writing Go code involving goroutine lifecycle, context propagation, graceful shutdown, process management, HTTP clients, structured logging (slog), table-driven tests, consumer-side interfaces, or any Go architectural decision. Triggers on Go code, go.mod changes, or Go-related questions.
+paths:
+  - "**/*.go"
+  - "go.mod"
+  - "go.sum"
+---
+
+# Go Expert Patterns
+
+Advanced Go patterns for daemon/service projects. Read `references/patterns.md` for the full reference when implementing complex patterns.
+
+## Quick reference — most common patterns
+
+### Goroutine lifecycle (oklog/run)
+```go
+var g run.Group
+// Add actors: each is an (execute, interrupt) pair
+g.Add(func() error { return server.Run(ctx) }, func(error) { cancel() })
+g.Add(func() error { <-ctx.Done(); return nil }, func(error) { cancel() })
+err := g.Run() // blocks until first actor returns, then interrupts all others
+```
+
+### Graceful shutdown
+```go
+ctx, stop := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM)
+defer stop()
+// ... run services with ctx ...
+// On signal: ctx is cancelled, services stop, cleanup runs
+shutdownCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
+defer cancel()
+service.Shutdown(shutdownCtx)
+```
+
+### Consumer-side interface
+```go
+// In the CONSUMER package, not the producer:
+type store interface {
+    Get(ctx context.Context, id string) (*Thing, error)
+    Put(ctx context.Context, thing *Thing) error
+}
+// The producer returns a concrete struct that implicitly satisfies this.
+```
+
+### Process management (exec.Cmd)
+```go
+cmd := exec.CommandContext(ctx, path)
+cmd.Dir = workDir
+cmd.Env = append(os.Environ(), "KEY=value")
+cmd.Stdout = logFile
+cmd.Stderr = logFile
+if err := cmd.Start(); err != nil { return err }
+// Graceful stop:
+cmd.Process.Signal(syscall.SIGTERM)
+done := make(chan error, 1)
+go func() { done <- cmd.Wait() }()
+select {
+case err := <-done: // exited
+case <-time.After(10 * time.Second):
+    cmd.Process.Kill()
+    <-done
+}
+```
+
+### Structured logging (slog)
+```go
+logger := slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{Level: slog.LevelInfo}))
+groupLogger := logger.With("group", groupName)
+runnerLogger := groupLogger.With("runner", runnerName)
+runnerLogger.Info("job completed", "result", "success", "duration_s", 42)
+```
+
+For the full pattern library, read `references/patterns.md`.
diff --git a/.claude/skills/go-expert/references/patterns.md b/.claude/skills/go-expert/references/patterns.md
new file mode 100644
index 0000000..b2259e7
--- /dev/null
+++ b/.claude/skills/go-expert/references/patterns.md
@@ -0,0 +1,550 @@
+# Go Expert — Full Pattern Reference
+
+## Table of Contents
+
+1. [Goroutine lifecycle management](#1-goroutine-lifecycle)
+2. [Context propagation](#2-context-propagation)
+3. [Error handling patterns](#3-error-handling)
+4. [HTTP client patterns](#4-http-client)
+5. [Process management](#5-process-management)
+6. [Structured logging (slog)](#6-structured-logging)
+7. [Testing patterns](#7-testing)
+8. [Configuration loading](#8-configuration)
+9. [Concurrency patterns](#9-concurrency)
+10. [File system operations](#10-filesystem)
+
+---
+
+## 1. Goroutine lifecycle
+
+### oklog/run for daemon actors
+
+```go
+import "github.com/oklog/run"
+
+var g run.Group
+
+// Actor: long-running service
+{
+    ctx, cancel := context.WithCancel(context.Background())
+    g.Add(
+        func() error { return myService.Run(ctx) },  // execute
+        func(error) { cancel() },                      // interrupt
+    )
+}
+
+// Actor: signal handler
+{
+    ctx, cancel := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM)
+    g.Add(
+        func() error { <-ctx.Done(); return nil },
+        func(error) { cancel() },
+    )
+}
+
+// When ANY actor returns, ALL others are interrupted via their interrupt func.
+if err := g.Run(); err != nil {
+    log.Fatal(err)
+}
+```
+
+### errgroup for bounded parallel work
+
+```go
+import "golang.org/x/sync/errgroup"
+
+g, ctx := errgroup.WithContext(ctx)
+g.SetLimit(10) // max 10 concurrent
+
+for _, item := range items {
+    g.Go(func() error {
+        return process(ctx, item)
+    })
+}
+if err := g.Wait(); err != nil {
+    return err
+}
+```
+
+### Worker pool with backpressure
+
+```go
+type Pool struct {
+    sem chan struct{}
+    wg  sync.WaitGroup
+}
+
+func NewPool(size int) *Pool {
+    return &Pool{sem: make(chan struct{}, size)}
+}
+
+func (p *Pool) Go(fn func()) {
+    p.wg.Add(1)
+    p.sem <- struct{}{} // blocks if pool is full
+    go func() {
+        defer p.wg.Done()
+        defer func() { <-p.sem }()
+        fn()
+    }()
+}
+
+func (p *Pool) Wait() { p.wg.Wait() }
+```
+
+---
+
+## 2. Context propagation
+
+### Always pass context, never store it
+
+```go
+// YES
+func (s *Service) Process(ctx context.Context, id string) error { ... }
+
+// NO — storing context in a struct
+type Service struct {
+    ctx context.Context  // don't do this
+}
+```
+
+### context.WithoutCancel for cleanup operations
+
+```go
+// Cleanup must complete even if parent context is cancelled
+func (s *Service) Shutdown(ctx context.Context) {
+    cleanupCtx := context.WithoutCancel(ctx)
+    // or: cleanupCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
+    s.cleanup(cleanupCtx)
+}
+```
+
+### Timeout per operation
+
+```go
+ctx, cancel := context.WithTimeout(ctx, 15*time.Second)
+defer cancel()
+resp, err := client.Do(req.WithContext(ctx))
+```
+
+---
+
+## 3. Error handling
+
+### Sentinel errors
+
+```go
+var (
+    ErrNotFound    = errors.New("not found")
+    ErrConflict    = errors.New("conflict")
+    ErrTimeout     = errors.New("timeout")
+)
+
+// Usage:
+if errors.Is(err, ErrNotFound) { ... }
+```
+
+### Wrapping with context
+
+```go
+func (s *Service) GetUser(ctx context.Context, id string) (*User, error) {
+    user, err := s.store.Get(ctx, id)
+    if err != nil {
+        return nil, fmt.Errorf("get user %s: %w", id, err)
+    }
+    return user, nil
+}
+```
+
+### Custom error types
+
+```go
+type ValidationError struct {
+    Field   string
+    Message string
+}
+
+func (e *ValidationError) Error() string {
+    return fmt.Sprintf("validation: %s: %s", e.Field, e.Message)
+}
+
+// Check:
+var ve *ValidationError
+if errors.As(err, &ve) {
+    log.Printf("field %s: %s", ve.Field, ve.Message)
+}
+```
+
+---
+
+## 4. HTTP client
+
+### Client with timeout and retry
+
+```go
+client := &http.Client{
+    Timeout: 30 * time.Second,
+    Transport: &http.Transport{
+        MaxIdleConns:        100,
+        MaxIdleConnsPerHost: 10,
+        IdleConnTimeout:     90 * time.Second,
+    },
+}
+```
+
+### Request with context
+
+```go
+req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
+if err != nil {
+    return fmt.Errorf("build request: %w", err)
+}
+req.Header.Set("Authorization", "Bearer "+token)
+req.Header.Set("Accept", "application/json")
+
+resp, err := client.Do(req)
+if err != nil {
+    return fmt.Errorf("request: %w", err)
+}
+defer resp.Body.Close()
+
+if resp.StatusCode >= 300 {
+    body, _ := io.ReadAll(resp.Body)
+    return fmt.Errorf("HTTP %d: %s", resp.StatusCode, string(body))
+}
+```
+
+### Exponential backoff with jitter
+
+```go
+func backoff(attempt int, base, max time.Duration) time.Duration {
+    d := base * time.Duration(1<<uint(attempt))
+    if d > max {
+        d = max
+    }
+    jitter := time.Duration(rand.Int63n(int64(d / 5)))
+    return d + jitter - d/10
+}
+```
+
+---
+
+## 5. Process management
+
+### Start with PID tracking
+
+```go
+cmd := exec.CommandContext(ctx, binPath)
+cmd.Dir = workDir
+cmd.Env = append(os.Environ(), envVars...)
+cmd.Stdout = logFile
+cmd.Stderr = logFile
+
+if err := cmd.Start(); err != nil {
+    return fmt.Errorf("start: %w", err)
+}
+
+// Write PID file
+pidPath := filepath.Join(workDir, ".pid")
+os.WriteFile(pidPath, []byte(strconv.Itoa(cmd.Process.Pid)), 0o644)
+```
+
+### Graceful stop (SIGTERM → wait → SIGKILL)
+
+```go
+func stopProcess(cmd *exec.Cmd, timeout time.Duration) error {
+    if cmd.Process == nil {
+        return nil
+    }
+    if err := cmd.Process.Signal(syscall.SIGTERM); err != nil {
+        return cmd.Process.Kill()
+    }
+    done := make(chan error, 1)
+    go func() { done <- cmd.Wait() }()
+    select {
+    case err := <-done:
+        return err
+    case <-time.After(timeout):
+        return cmd.Process.Kill()
+    }
+}
+```
+
+### Check PID alive
+
+```go
+func pidAlive(pid int) bool {
+    if pid <= 0 {
+        return false
+    }
+    err := syscall.Kill(pid, 0)
+    return err == nil || errors.Is(err, syscall.EPERM)
+}
+```
+
+---
+
+## 6. Structured logging
+
+### slog with JSON handler
+
+```go
+handler := slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
+    Level:     slog.LevelInfo,
+    AddSource: false,
+})
+logger := slog.New(handler)
+```
+
+### Logger hierarchy with context
+
+```go
+daemonLogger := logger.With("component", "daemon")
+groupLogger := daemonLogger.With("group", groupName)
+runnerLogger := groupLogger.With("runner", runnerName)
+
+runnerLogger.Info("job completed",
+    "job_id", jobID,
+    "result", "success",
+    "duration_s", elapsed.Seconds(),
+)
+```
+
+### Multi-handler (write to multiple destinations)
+
+```go
+type MultiHandler struct {
+    handlers []slog.Handler
+}
+
+func (m *MultiHandler) Enabled(ctx context.Context, level slog.Level) bool {
+    for _, h := range m.handlers {
+        if h.Enabled(ctx, level) {
+            return true
+        }
+    }
+    return false
+}
+
+func (m *MultiHandler) Handle(ctx context.Context, r slog.Record) error {
+    for _, h := range m.handlers {
+        if h.Enabled(ctx, r.Level) {
+            _ = h.Handle(ctx, r)
+        }
+    }
+    return nil
+}
+
+func (m *MultiHandler) WithAttrs(attrs []slog.Attr) slog.Handler {
+    handlers := make([]slog.Handler, len(m.handlers))
+    for i, h := range m.handlers {
+        handlers[i] = h.WithAttrs(attrs)
+    }
+    return &MultiHandler{handlers: handlers}
+}
+
+func (m *MultiHandler) WithGroup(name string) slog.Handler {
+    handlers := make([]slog.Handler, len(m.handlers))
+    for i, h := range m.handlers {
+        handlers[i] = h.WithGroup(name)
+    }
+    return &MultiHandler{handlers: handlers}
+}
+```
+
+---
+
+## 7. Testing
+
+### Table-driven test
+
+```go
+func TestParseConfig(t *testing.T) {
+    tests := []struct {
+        name    string
+        input   string
+        want    *Config
+        wantErr string
+    }{
+        {
+            name:  "valid org scope",
+            input: "https://github.com/my-org",
+            want:  &Config{Scope: "org", Owner: "my-org"},
+        },
+        {
+            name:    "empty URL",
+            input:   "",
+            wantErr: "url is required",
+        },
+    }
+
+    for _, tt := range tests {
+        t.Run(tt.name, func(t *testing.T) {
+            got, err := ParseConfig(tt.input)
+            if tt.wantErr != "" {
+                require.ErrorContains(t, err, tt.wantErr)
+                return
+            }
+            require.NoError(t, err)
+            assert.Equal(t, tt.want, got)
+        })
+    }
+}
+```
+
+### HTTP test server
+
+```go
+func TestClient_ListRunners(t *testing.T) {
+    srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+        assert.Equal(t, "/orgs/my-org/actions/runners", r.URL.Path)
+        assert.Equal(t, "Bearer test-token", r.Header.Get("Authorization"))
+        json.NewEncoder(w).Encode(map[string]any{
+            "runners": []map[string]any{
+                {"id": 1, "name": "runner-1", "status": "online"},
+            },
+        })
+    }))
+    defer srv.Close()
+
+    client := NewClient(srv.URL, "test-token")
+    runners, err := client.ListRunners(context.Background())
+    require.NoError(t, err)
+    assert.Len(t, runners, 1)
+}
+```
+
+---
+
+## 8. Configuration
+
+### YAML with defaults
+
+```go
+type Config struct {
+    Level   string `yaml:"level"`
+    Dir     string `yaml:"dir"`
+    MaxSize int    `yaml:"max_size"`
+}
+
+func (c *Config) applyDefaults() {
+    if c.Level == "" {
+        c.Level = "info"
+    }
+    if c.Dir == "" {
+        if os.Getuid() == 0 {
+            c.Dir = "/var/log/ghr"
+        } else {
+            home, _ := os.UserHomeDir()
+            c.Dir = filepath.Join(home, ".local", "share", "ghr", "logs")
+        }
+    }
+}
+```
+
+### Validation
+
+```go
+func (c *Config) Validate() error {
+    if len(c.Groups) == 0 {
+        return fmt.Errorf("at least one group is required")
+    }
+    for i, g := range c.Groups {
+        if g.Name == "" {
+            return fmt.Errorf("groups[%d].name is required", i)
+        }
+        if g.MaxRunners < 1 {
+            return fmt.Errorf("groups[%d].max_runners must be >= 1", i)
+        }
+        if g.MinRunners > g.MaxRunners {
+            return fmt.Errorf("groups[%d].min_runners (%d) > max_runners (%d)", i, g.MinRunners, g.MaxRunners)
+        }
+    }
+    return nil
+}
+```
+
+---
+
+## 9. Concurrency
+
+### Mutex-protected state
+
+```go
+type RunnerState struct {
+    mu   sync.Mutex
+    idle map[string]*Process
+    busy map[string]*Process
+}
+
+func (s *RunnerState) MarkBusy(name string) {
+    s.mu.Lock()
+    defer s.mu.Unlock()
+    proc, ok := s.idle[name]
+    if !ok {
+        return // log warning
+    }
+    delete(s.idle, name)
+    s.busy[name] = proc
+}
+
+func (s *RunnerState) Count() int {
+    s.mu.Lock()
+    defer s.mu.Unlock()
+    return len(s.idle) + len(s.busy)
+}
+
+func (s *RunnerState) Snapshot() []RunnerSnapshot {
+    s.mu.Lock()
+    defer s.mu.Unlock()
+    // Return a copy, not the map itself
+    out := make([]RunnerSnapshot, 0, len(s.idle)+len(s.busy))
+    for _, p := range s.idle { out = append(out, p.Snapshot("idle")) }
+    for _, p := range s.busy { out = append(out, p.Snapshot("busy")) }
+    return out
+}
+```
+
+---
+
+## 10. Filesystem
+
+### Safe directory copy
+
+```go
+func copyDir(src, dst string) error {
+    return filepath.WalkDir(src, func(path string, d fs.DirEntry, err error) error {
+        if err != nil {
+            return err
+        }
+        rel, _ := filepath.Rel(src, path)
+        target := filepath.Join(dst, rel)
+
+        if d.IsDir() {
+            return os.MkdirAll(target, 0o755)
+        }
+
+        info, err := d.Info()
+        if err != nil {
+            return err
+        }
+        return copyFile(path, target, info.Mode())
+    })
+}
+
+func copyFile(src, dst string, perm fs.FileMode) error {
+    in, err := os.Open(src)
+    if err != nil {
+        return err
+    }
+    defer in.Close()
+
+    out, err := os.OpenFile(dst, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, perm)
+    if err != nil {
+        return err
+    }
+    defer out.Close()
+
+    _, err = io.Copy(out, in)
+    return err
+}
+```
diff --git a/.claude/skills/pm/SKILL.md b/.claude/skills/pm/SKILL.md
new file mode 100644
index 0000000..bec02c3
--- /dev/null
+++ b/.claude/skills/pm/SKILL.md
@@ -0,0 +1,73 @@
+---
+name: pm
+description: Project manager orchestrator for ghr v2. Use this when the user wants to discuss features, plan work, create specs, or implement features with a structured workflow. Acts as a tech lead that delegates to specialized agents (spec-writer, developer, reviewer, tester). Triggers on project planning, feature discussion, spec creation, implementation requests, or when the user says "pm", "project manager", "let's plan", "let's implement", or "new feature".
+---
+
+# Project Manager — ghr v2
+
+You are the **technical project manager** for ghr v2. You orchestrate the project by talking to the user and delegating to specialized agents.
+
+## Your role
+
+- You are the single point of contact for the user
+- You understand the full project vision (specs in `specs/`)
+- You make decisions, prioritize, and delegate
+- You never write code yourself — you delegate to agents
+- You track progress and report back concisely
+
+## How you work
+
+### When the user wants to discuss / brainstorm
+Talk directly. Ask questions. Challenge ideas. Push back if something is over-engineered or contradicts existing specs. Your goal: converge on a clear decision.
+
+### When the user wants a new spec
+Delegate to the **spec-writer** agent. But first:
+1. Have a conversation with the user to understand EXACTLY what they want
+2. Ask targeted questions (not open-ended dumps)
+3. Reference existing specs that might be impacted
+4. Once you have clarity, write a brief (~5 lines) for the spec-writer
+5. Spawn the spec-writer agent with the brief
+6. Review the output, show it to the user, iterate
+
+### When the user wants to implement a feature
+1. Identify which spec(s) cover this feature
+2. Break it down into implementation tasks (ordered by dependency)
+3. For each task, spawn the **developer** agent with precise instructions
+4. After implementation, spawn the **code-reviewer** agent
+5. After review, spawn the **tester** agent if tests are missing
+6. Run `/check` to validate the full pipeline
+7. Report results to the user
+
+### When the user asks about project status
+Read the specs, check which files exist in `internal/`, report what's done vs what's left.
+
+## Agents you can delegate to
+
+| Agent | Use for |
+|---|---|
+| **spec-writer** | Writing new specs or updating existing ones. Give it a clear brief. |
+| **developer** | Writing Go code. Give it the spec reference, files to create/modify, and expected behavior. |
+| **code-reviewer** | Reviewing code quality, Go idioms, spec compliance. Give it the files to review. |
+| **spec-checker** | Verifying implementation matches specs. Give it the spec + implementation files. |
+
+## Rules
+
+- **Never guess** — if you're unsure about the user's intent, ask
+- **One thing at a time** — don't overload agents with multiple unrelated tasks
+- **Show your plan** — before delegating, tell the user what you're about to do
+- **Keep context lean** — give agents only what they need, not the whole project history
+- **Specs are the source of truth** — always check specs before making decisions
+- **Flag contradictions** — if something conflicts with existing specs, surface it immediately
+
+## Current specs
+
+Read the relevant ones before any decision:
+- `specs/00-architecture.md` — package structure, interfaces, DI
+- `specs/01-core-scaleset.md` — scale set engine, scaler, runner manager
+- `specs/02-cli-commands.md` — CLI commands (start/stop/run/status/purge/login)
+- `specs/03-health-monitor.md` — health checks
+- `specs/04-logging.md` — structured logging
+- `specs/05-notifications.md` — Discord, webhooks
+- `specs/06-uptime-kuma.md` — push monitoring
+- `specs/07-config.md` — YAML config schema
+- `specs/08-auth.md` — authentication (login wizard)
diff --git a/.claude/skills/scaleset-sdk/SKILL.md b/.claude/skills/scaleset-sdk/SKILL.md
new file mode 100644
index 0000000..ddab90e
--- /dev/null
+++ b/.claude/skills/scaleset-sdk/SKILL.md
@@ -0,0 +1,137 @@
+---
+name: scaleset-sdk
+description: Build custom GitHub Actions runner autoscalers using the official actions/scaleset Go SDK. Use this skill whenever working with GitHub Actions Runner Scale Sets, implementing the Scaler interface, configuring JIT runners, managing scale set sessions, or building self-hosted runner infrastructure (including ghr). Triggers on any code importing "github.com/actions/scaleset", any mention of scale sets, JIT runner config, runner autoscaling, or self-hosted runner management with Go. Also use when debugging scale set authentication, message polling, or runner lifecycle issues.
+---
+
+# GitHub Actions Runner Scale Set SDK
+
+Complete reference for building custom autoscaling solutions with `github.com/actions/scaleset`.
+
+## When to use this skill
+
+- Writing Go code that imports `github.com/actions/scaleset` or `github.com/actions/scaleset/listener`
+- Implementing the `listener.Scaler` interface
+- Building a custom runner backend (process, VM, container)
+- Debugging scale set auth, polling, or runner lifecycle
+- Working on ghr (GitHub runner controller for macOS)
+
+## Quick start pattern
+
+Every scale set autoscaler follows this skeleton:
+
+```go
+// 1. Create client (PAT or GitHub App)
+client, _ := scaleset.NewClientWithPersonalAccessToken(scaleset.NewClientWithPersonalAccessTokenConfig{
+    GitHubConfigURL:     "https://github.com/my-org",
+    PersonalAccessToken: token,
+    SystemInfo:          scaleset.SystemInfo{System: "ghr", Version: "1.0"},
+})
+
+// 2. Create or get scale set
+scaleSet, _ := client.CreateRunnerScaleSet(ctx, &scaleset.RunnerScaleSet{
+    Name:          "my-runners",        // this IS the runs-on: label
+    RunnerGroupID: 1,                   // 1 = "default"
+    Labels:        []scaleset.Label{{Type: "System", Name: "my-runners"}},
+    RunnerSetting: scaleset.RunnerSetting{DisableUpdate: true},
+})
+defer client.DeleteRunnerScaleSet(context.WithoutCancel(ctx), scaleSet.ID)
+
+// 3. Open message session
+sessionClient, _ := client.MessageSessionClient(ctx, scaleSet.ID, hostname)
+defer sessionClient.Close(context.Background())
+
+// 4. Create listener + run with your Scaler
+l, _ := listener.New(sessionClient, listener.Config{
+    ScaleSetID: scaleSet.ID,
+    MaxRunners: 15,
+})
+l.Run(ctx, &MyScaler{})  // blocks until ctx cancelled or error
+```
+
+## The Scaler interface (the only thing you implement)
+
+```go
+type Scaler interface {
+    HandleDesiredRunnerCount(ctx context.Context, count int) (int, error)
+    HandleJobStarted(ctx context.Context, jobInfo *scaleset.JobStarted) error
+    HandleJobCompleted(ctx context.Context, jobInfo *scaleset.JobCompleted) error
+}
+```
+
+### HandleDesiredRunnerCount(ctx, count) (int, error)
+
+- `count` = `statistics.TotalAssignedJobs` (jobs needing runners RIGHT NOW)
+- Called VERY frequently: at init, after every message, after every long-poll timeout (~50s)
+- Return the actual runner count you scaled to (used for metrics only)
+- Any error terminates `Run()`
+- Scaling formula from the reference example: `target = min(maxRunners, minRunners + count)`
+- Scale-down is NOT done here — it happens in `HandleJobCompleted`
+
+### HandleJobStarted(ctx, jobInfo) error
+
+- Mark the runner as busy (bookkeeping). No scaling action needed.
+- `jobInfo.RunnerName` identifies which runner got the job.
+- Any error terminates `Run()`
+
+### HandleJobCompleted(ctx, jobInfo) error
+
+- THIS is where scale-down happens: destroy the runner process/container/VM + cleanup workdir.
+- `jobInfo.RunnerName` identifies which runner to destroy.
+- `jobInfo.Result`: `"Succeeded"`, `"Failed"`, or `"Cancelled"` (cancelled = job reassignment, not a real completion)
+- Any error terminates `Run()`
+
+### Processing order within a single message batch
+
+1. AcquireJobs (automatic, not exposed to Scaler)
+2. All HandleJobStarted calls
+3. All HandleJobCompleted calls
+4. HandleDesiredRunnerCount
+
+JobCompleted runs BEFORE HandleDesiredRunnerCount. This is why the count naturally decreases after runners are cleaned up.
+
+## JIT Runner Config (replaces config.sh)
+
+```go
+jit, _ := scalesetClient.GenerateJitRunnerConfig(ctx,
+    &scaleset.RunnerScaleSetJitRunnerSetting{Name: "runner-abc123"},
+    scaleSetID,
+)
+// jit.EncodedJITConfig is a base64 blob — treat as SECRET until consumed
+```
+
+The runner binary reads the JIT config from an env var instead of needing `config.sh`:
+
+```go
+cmd := exec.Command("./run.sh")
+cmd.Env = append(os.Environ(), "ACTIONS_RUNNER_INPUT_JITCONFIG="+jit.EncodedJITConfig)
+cmd.Start()
+```
+
+No `config.sh` step needed. No registration token management. The JIT config contains everything.
+
+## Authentication
+
+Read `references/api-reference.md` section "Authentication" for the full flow. Summary:
+
+- **GitHub App (recommended)**: `ClientID` + `InstallationID` + `PrivateKey` (PEM). Auto-rotates tokens.
+- **PAT**: simpler, broader scope. Pass as `PersonalAccessToken`.
+- Token exchange is automatic: PAT/App -> registration token -> admin token. Refresh is transparent (60s before expiry).
+
+## Key design facts
+
+1. **Scale set name = workflow label**. `runs-on: my-scale-set` targets the scale set named `my-scale-set`.
+2. **Runners are ephemeral by default**. One job, then removed.
+3. **Long-polling, not interval polling**. `GetMessage` blocks up to ~50s. React instantly to new jobs.
+4. **Message ack is optimistic**. Messages are deleted BEFORE your Scaler processes them.
+5. **`handleMessage` uses `context.WithoutCancel`**. Even during shutdown, message processing completes.
+6. **Scale set is deleted on daemon shutdown** (`defer DeleteRunnerScaleSet`). Clean state on restart.
+7. **Session token refresh is automatic**. 401 -> refresh -> retry (once). Transparent to your code.
+8. **Any Scaler error kills the listener loop**. Handle transient errors inside your Scaler.
+9. **SetMaxRunners is thread-safe**. Call it anytime to adjust capacity dynamically.
+10. **Go 1.25+ required**.
+
+## Reference docs
+
+For detailed API signatures, types, error handling, and endpoint maps, read:
+- `references/api-reference.md` — Complete SDK reference (types, methods, auth, errors, endpoints)
+- `references/macos-adaptation.md` — How to adapt the Docker example to macOS process-based runners
diff --git a/.claude/skills/scaleset-sdk/references/api-reference.md b/.claude/skills/scaleset-sdk/references/api-reference.md
new file mode 100644
index 0000000..cb24b20
--- /dev/null
+++ b/.claude/skills/scaleset-sdk/references/api-reference.md
@@ -0,0 +1,515 @@
+# actions/scaleset — Complete API Reference
+
+## Table of Contents
+
+1. [Package constants](#1-package-constants)
+2. [Core types](#2-core-types)
+3. [Job message types](#3-job-message-types)
+4. [Client construction](#4-client-construction)
+5. [HTTP options](#5-http-options)
+6. [Authentication flow](#6-authentication-flow)
+7. [Client API methods](#7-client-api-methods)
+8. [MessageSessionClient](#8-messagesessionclient)
+9. [Listener package](#9-listener-package)
+10. [Error handling](#10-error-handling)
+11. [Config URL parsing](#11-config-url-parsing)
+12. [Full endpoint map](#12-full-endpoint-map)
+13. [Statistics fields](#13-statistics-fields)
+14. [Long-polling mechanics](#14-long-polling-mechanics)
+15. [Concurrency model](#15-concurrency-model)
+16. [Known limitations](#16-known-limitations)
+17. [Dependencies](#17-dependencies)
+
+---
+
+## 1. Package constants
+
+```go
+const HeaderScaleSetMaxCapacity = "X-ScaleSetMaxCapacity"
+const DefaultRunnerGroup = "default"
+
+type MessageType string
+const (
+    MessageTypeJobAvailable MessageType = "JobAvailable"
+    MessageTypeJobAssigned  MessageType = "JobAssigned"
+    MessageTypeJobStarted   MessageType = "JobStarted"
+    MessageTypeJobCompleted MessageType = "JobCompleted"
+)
+
+var ErrInvalidGitHubConfigURL = fmt.Errorf("invalid config URL, should point to an enterprise, org, or repository")
+```
+
+---
+
+## 2. Core types
+
+```go
+type RunnerScaleSet struct {
+    ID                 int                      `json:"id,omitempty"`
+    Name               string                   `json:"name,omitempty"`
+    RunnerGroupID      int                      `json:"runnerGroupId,omitempty"`
+    RunnerGroupName    string                   `json:"runnerGroupName,omitempty"`
+    Labels             []Label                  `json:"labels,omitempty"`
+    RunnerSetting      RunnerSetting            `json:"RunnerSetting,omitempty"`
+    CreatedOn          time.Time                `json:"createdOn,omitempty"`
+    RunnerJitConfigURL string                   `json:"runnerJitConfigUrl,omitempty"`
+    Statistics         *RunnerScaleSetStatistic `json:"statistics,omitempty"`
+}
+
+type Label struct {
+    Type string `json:"type"`  // "System" or empty (defaults to "System")
+    Name string `json:"name"`
+}
+
+type RunnerSetting struct {
+    DisableUpdate bool `json:"disableUpdate,omitempty"`
+}
+
+type RunnerGroup struct {
+    ID        int    `json:"id"`
+    Name      string `json:"name"`
+    Size      int    `json:"size"`
+    IsDefault bool   `json:"isDefaultGroup"`
+}
+
+type RunnerScaleSetSession struct {
+    SessionID               uuid.UUID                `json:"sessionId,omitempty"`
+    OwnerName               string                   `json:"ownerName,omitempty"`
+    RunnerScaleSet          *RunnerScaleSet          `json:"runnerScaleSet,omitempty"`
+    MessageQueueURL         string                   `json:"messageQueueUrl,omitempty"`
+    MessageQueueAccessToken string                   `json:"messageQueueAccessToken,omitempty"`
+    Statistics              *RunnerScaleSetStatistic `json:"statistics,omitempty"`
+}
+
+type RunnerScaleSetStatistic struct {
+    TotalAvailableJobs     int `json:"totalAvailableJobs"`
+    TotalAcquiredJobs      int `json:"totalAcquiredJobs"`
+    TotalAssignedJobs      int `json:"totalAssignedJobs"`   // THE scaling metric
+    TotalRunningJobs       int `json:"totalRunningJobs"`
+    TotalRegisteredRunners int `json:"totalRegisteredRunners"`
+    TotalBusyRunners       int `json:"totalBusyRunners"`
+    TotalIdleRunners       int `json:"totalIdleRunners"`
+}
+
+type RunnerScaleSetMessage struct {
+    MessageID            int
+    Statistics           *RunnerScaleSetStatistic
+    JobAvailableMessages []*JobAvailable
+    JobAssignedMessages  []*JobAssigned
+    JobStartedMessages   []*JobStarted
+    JobCompletedMessages []*JobCompleted
+}
+
+type RunnerScaleSetJitRunnerSetting struct {
+    Name       string `json:"name"`
+    WorkFolder string `json:"workFolder"`
+}
+
+type RunnerScaleSetJitRunnerConfig struct {
+    Runner           *RunnerReference `json:"runner"`
+    EncodedJITConfig string           `json:"encodedJITConfig"`
+}
+
+type RunnerReference struct {
+    ID               int    `json:"id"`
+    Name             string `json:"name"`
+    RunnerScaleSetID int    `json:"runnerScaleSetId"`
+}
+
+type SystemInfo struct {
+    System     string `json:"system"`
+    Version    string `json:"version"`
+    CommitSHA  string `json:"commit_sha"`
+    ScaleSetID int    `json:"scale_set_id"`
+    Subsystem  string `json:"subsystem"`
+}
+
+type GitHubAppAuth struct {
+    ClientID       string
+    InstallationID int64
+    PrivateKey     string  // PEM-formatted RSA private key
+}
+
+type ProxyFunc func(req *http.Request) (*url.URL, error)
+```
+
+---
+
+## 3. Job message types
+
+```go
+type JobMessageBase struct {
+    JobMessageType
+    RunnerRequestID    int64     `json:"runnerRequestId"`
+    RepositoryName     string    `json:"repositoryName"`
+    OwnerName          string    `json:"ownerName"`
+    JobID              string    `json:"jobId"`
+    JobWorkflowRef     string    `json:"jobWorkflowRef"`
+    JobDisplayName     string    `json:"jobDisplayName"`
+    WorkflowRunID      int64     `json:"workflowRunId"`
+    EventName          string    `json:"eventName"`
+    RequestLabels      []string  `json:"requestLabels"`
+    QueueTime          time.Time `json:"queueTime"`
+    ScaleSetAssignTime time.Time `json:"scaleSetAssignTime"`
+    RunnerAssignTime   time.Time `json:"runnerAssignTime"`
+    FinishTime         time.Time `json:"finishTime"`
+}
+
+type JobAvailable struct {
+    AcquireJobURL string `json:"acquireJobUrl"`
+    JobMessageBase
+}
+
+type JobAssigned struct {
+    JobMessageBase
+}
+
+type JobStarted struct {
+    RunnerID   int    `json:"runnerId"`
+    RunnerName string `json:"runnerName"`
+    JobMessageBase
+}
+
+type JobCompleted struct {
+    Result     string `json:"result"`  // "Succeeded", "Failed", "Cancelled"
+    RunnerID   int    `json:"runnerId"`
+    RunnerName string `json:"runnerName"`
+    JobMessageBase
+}
+```
+
+---
+
+## 4. Client construction
+
+```go
+// GitHub App (recommended)
+type ClientWithGitHubAppConfig struct {
+    GitHubConfigURL string
+    GitHubAppAuth   GitHubAppAuth
+    SystemInfo      SystemInfo
+}
+func NewClientWithGitHubApp(config ClientWithGitHubAppConfig, options ...HTTPOption) (*Client, error)
+
+// PAT
+type NewClientWithPersonalAccessTokenConfig struct {
+    GitHubConfigURL     string
+    PersonalAccessToken string
+    SystemInfo          SystemInfo
+}
+func NewClientWithPersonalAccessToken(config NewClientWithPersonalAccessTokenConfig, options ...HTTPOption) (*Client, error)
+```
+
+GitHubConfigURL examples:
+- Org: `https://github.com/my-org`
+- Repo: `https://github.com/my-org/my-repo`
+- Enterprise: `https://github.com/enterprises/my-enterprise`
+- GHES: `https://ghes.company.com/my-org`
+
+---
+
+## 5. HTTP options
+
+```go
+type HTTPOption func(*httpClientOption)
+
+func WithRetryMax(retryMax int) HTTPOption              // default: 4
+func WithRetryWaitMax(retryWaitMax time.Duration) HTTPOption  // default: 30s
+func WithTimeout(duration time.Duration) HTTPOption      // default: 5min
+func WithLogger(logger *slog.Logger) HTTPOption          // default: discard
+func WithRootCAs(rootCAs *x509.CertPool) HTTPOption      // custom CA pool
+func WithoutTLSVerify() HTTPOption                        // skip TLS verification
+func WithProxy(proxyFunc ProxyFunc) HTTPOption            // custom proxy
+func WithRetryableHTTPClint(client *retryablehttp.Client) HTTPOption  // NOTE: typo in name is intentional (published API)
+```
+
+---
+
+## 6. Authentication flow
+
+### GitHub App path (4 steps, all automatic)
+
+1. **Create JWT**: RS256 signed, iat = now-60s (clock skew), exp = iat+9min, iss = ClientID
+2. **Get installation access token**: `POST /app/installations/{id}/access_tokens` with Bearer JWT
+3. **Get registration token**: `POST /orgs/{org}/actions/runners/registration-token` (or /repos/ or /enterprises/) with Bearer access_token
+4. **Get admin connection**: `POST /actions/runner-registration` with `Authorization: RemoteAuth {registration_token}` — returns `ActionsServiceURL` + `AdminToken` (JWT)
+
+### PAT path (2 steps)
+
+1. **Get registration token**: same endpoint, with Bearer PAT directly
+2. **Get admin connection**: same as step 4 above
+
+### Token refresh
+
+`updateTokenIfNeeded()` runs before every Actions Service request. If admin token expires within 60s, full chain re-executes. Expiry parsed from JWT claims (ParseUnverified).
+
+The admin connection request retries on 401 and 403 (propagation delays).
+
+---
+
+## 7. Client API methods
+
+All methods are thread-safe (mutex-protected).
+
+### Scale Set CRUD
+
+```go
+func (c *Client) CreateRunnerScaleSet(ctx, *RunnerScaleSet) (*RunnerScaleSet, error)
+// POST /_apis/runtime/runnerscalesets
+// Auto-adds label from Name if no labels provided. Errors if both Name and Labels empty.
+
+func (c *Client) GetRunnerScaleSet(ctx, runnerGroupID int, name string) (*RunnerScaleSet, error)
+// GET /_apis/runtime/runnerscalesets?runnerGroupId={id}&name={name}
+// Returns nil,nil if count=0. Error if count>1.
+
+func (c *Client) GetRunnerScaleSetByID(ctx, id int) (*RunnerScaleSet, error)
+// GET /_apis/runtime/runnerscalesets/{id}
+
+func (c *Client) UpdateRunnerScaleSet(ctx, id int, *RunnerScaleSet) (*RunnerScaleSet, error)
+// PATCH /_apis/runtime/runnerscalesets/{id}
+
+func (c *Client) DeleteRunnerScaleSet(ctx, id int) error
+// DELETE /_apis/runtime/runnerscalesets/{id} — expects 204
+```
+
+### Runner management
+
+```go
+func (c *Client) GetRunner(ctx, runnerID int) (*RunnerReference, error)
+func (c *Client) GetRunnerByName(ctx, name string) (*RunnerReference, error)  // nil,nil if not found
+func (c *Client) RemoveRunner(ctx, runnerID int64) error                      // expects 204
+```
+
+### JIT config
+
+```go
+func (c *Client) GenerateJitRunnerConfig(ctx, *RunnerScaleSetJitRunnerSetting, scaleSetID int) (*RunnerScaleSetJitRunnerConfig, error)
+// POST /_apis/runtime/runnerscalesets/{id}/generatejitconfig
+```
+
+### Runner group
+
+```go
+func (c *Client) GetRunnerGroupByName(ctx, name string) (*RunnerGroup, error)
+// Default group has ID=1 (hardcode for "default" to skip this call)
+```
+
+### Message session
+
+```go
+func (c *Client) MessageSessionClient(ctx, scaleSetID int, owner string, options ...HTTPOption) (*MessageSessionClient, error)
+// Creates session immediately (POST). owner = hostname or UUID.
+```
+
+### Utility
+
+```go
+func (c *Client) SetSystemInfo(info SystemInfo)
+func (c *Client) SystemInfo() SystemInfo
+func (c *Client) DebugInfo() string  // JSON with HasProxy, HasRootCA, SystemInfo
+```
+
+---
+
+## 8. MessageSessionClient
+
+```go
+func (c *MessageSessionClient) GetMessage(ctx, lastMessageID, maxCapacity int) (*RunnerScaleSetMessage, error)
+// Long-polls. 200=message, 202=nil,nil (no messages). Auto-refreshes on 401.
+
+func (c *MessageSessionClient) DeleteMessage(ctx, messageID int) error
+// Ack. 204=success. Auto-refreshes on 401.
+
+func (c *MessageSessionClient) AcquireJobs(ctx, requestIDs []int64) ([]int64, error)
+// Claims jobs. Returns actually acquired IDs (may be subset).
+
+func (c *MessageSessionClient) Session() RunnerScaleSetSession
+// Returns copy of current session.
+
+func (c *MessageSessionClient) Close(ctx) error
+// Deletes session. Always call (use defer).
+```
+
+---
+
+## 9. Listener package
+
+```go
+import "github.com/actions/scaleset/listener"
+
+type Config struct {
+    ScaleSetID int
+    MaxRunners int
+    Logger     *slog.Logger
+}
+
+func New(client Client, config Config, options ...Option) (*Listener, error)
+func (l *Listener) Run(ctx context.Context, scaler Scaler) error
+func (l *Listener) SetMaxRunners(count int)  // thread-safe, takes effect on next poll
+
+type Scaler interface {
+    HandleDesiredRunnerCount(ctx context.Context, count int) (int, error)
+    HandleJobStarted(ctx context.Context, jobInfo *scaleset.JobStarted) error
+    HandleJobCompleted(ctx context.Context, jobInfo *scaleset.JobCompleted) error
+}
+
+type Client interface {
+    GetMessage(ctx context.Context, lastMessageID, maxCapacity int) (*scaleset.RunnerScaleSetMessage, error)
+    DeleteMessage(ctx context.Context, messageID int) error
+    AcquireJobs(ctx context.Context, requestIDs []int64) ([]int64, error)
+    Session() scaleset.RunnerScaleSetSession
+}
+
+type MetricsRecorder interface {
+    RecordStatistics(statistics *scaleset.RunnerScaleSetStatistic)
+    RecordJobStarted(msg *scaleset.JobStarted)
+    RecordJobCompleted(msg *scaleset.JobCompleted)
+    RecordDesiredRunners(count int)
+}
+
+func WithMetricsRecorder(recorder MetricsRecorder) Option
+```
+
+### Run() loop internals
+
+1. Read initial session statistics
+2. Call `HandleDesiredRunnerCount(ctx, initialStats.TotalAssignedJobs)`
+3. Loop:
+   - `GetMessage(ctx, lastMessageID, maxRunners)` — long-polls ~50s
+   - If nil: call `HandleDesiredRunnerCount` with cached stats, continue
+   - If message: ack (DeleteMessage) → AcquireJobs → HandleJobStarted(s) → HandleJobCompleted(s) → HandleDesiredRunnerCount
+   - Any error from Scaler: return error (terminates Run)
+
+---
+
+## 10. Error handling
+
+### Sentinel errors
+
+```go
+var RunnerNotFoundError           = scalesetError("runner not found")
+var RunnerExistsError             = scalesetError("runner exists")
+var JobStillRunningError          = scalesetError("job still running")
+var MessageQueueTokenExpiredError = scalesetError("message queue token expired")
+```
+
+Use `errors.Is(err, scaleset.RunnerNotFoundError)` etc.
+
+### Exception mapping
+
+Server returns JSON `{"typeName":"...", "message":"..."}`. Mapped:
+- `AgentExistsException` → `RunnerExistsError`
+- `AgentNotFoundException` → `RunnerNotFoundError`
+- `JobStillRunningException` → `JobStillRunningError`
+
+### Error metadata
+
+All HTTP errors include ActivityId and X-GitHub-Request-Id headers in the message.
+
+---
+
+## 11. Config URL parsing
+
+| URL pattern | Scope | Example |
+|---|---|---|
+| `github.com/{org}` | Organization | `https://github.com/my-org` |
+| `github.com/{org}/{repo}` | Repository | `https://github.com/my-org/my-repo` |
+| `github.com/enterprises/{name}` | Enterprise | `https://github.com/enterprises/my-ent` |
+| `ghes.example.com/{org}` | Org (GHES) | `https://ghes.corp.com/my-org` |
+
+API URL routing:
+- Hosted (github.com, *.ghe.com): `api.github.com` or `api.{host}`
+- GHES: `{host}/api/v3`
+- `GITHUB_ACTIONS_FORCE_GHES` env var forces GHES mode
+
+---
+
+## 12. Full endpoint map
+
+| Method | HTTP | Endpoint | Status |
+|---|---|---|---|
+| CreateRunnerScaleSet | POST | `/_apis/runtime/runnerscalesets` | 200 |
+| GetRunnerScaleSet | GET | `/_apis/runtime/runnerscalesets?runnerGroupId=&name=` | 200 |
+| GetRunnerScaleSetByID | GET | `/_apis/runtime/runnerscalesets/{id}` | 200 |
+| UpdateRunnerScaleSet | PATCH | `/_apis/runtime/runnerscalesets/{id}` | 200 |
+| DeleteRunnerScaleSet | DELETE | `/_apis/runtime/runnerscalesets/{id}` | 204 |
+| GetRunnerGroupByName | GET | `/_apis/runtime/runnergroups/?groupName=` | 200 |
+| GetRunner | GET | `/_apis/distributedtask/pools/0/agents/{id}` | 200 |
+| GetRunnerByName | GET | `/_apis/distributedtask/pools/0/agents?agentName=` | 200 |
+| RemoveRunner | DELETE | `/_apis/distributedtask/pools/0/agents/{id}` | 204 |
+| GenerateJitRunnerConfig | POST | `/_apis/runtime/runnerscalesets/{id}/generatejitconfig` | 200 |
+| createMessageSession | POST | `/_apis/runtime/runnerscalesets/{id}/sessions` | 200 |
+| deleteMessageSession | DELETE | `/_apis/runtime/runnerscalesets/{id}/sessions/{sessionId}` | 204 |
+| refreshMessageSession | PATCH | `/_apis/runtime/runnerscalesets/{id}/sessions/{sessionId}` | 200 |
+| AcquireJobs | POST | `/_apis/runtime/runnerscalesets/{id}/acquirejobs` | 200 |
+| GetMessage | GET | `{messageQueueURL}?lastMessageId=` | 200/202 |
+| DeleteMessage | DELETE | `{messageQueueURL}/{messageId}` | 204 |
+| Registration token (org) | POST | `/orgs/{org}/actions/runners/registration-token` | 201 |
+| Registration token (repo) | POST | `/repos/{owner}/{repo}/actions/runners/registration-token` | 201 |
+| Registration token (ent) | POST | `/enterprises/{ent}/actions/runners/registration-token` | 201 |
+| Access token (App) | POST | `/app/installations/{id}/access_tokens` | 201 |
+| Admin connection | POST | `/actions/runner-registration` | 2xx |
+
+---
+
+## 13. Statistics fields
+
+```go
+type RunnerScaleSetStatistic struct {
+    TotalAvailableJobs     int  // jobs waiting to be assigned
+    TotalAcquiredJobs      int  // jobs claimed by AcquireJobs
+    TotalAssignedJobs      int  // THE metric: jobs that need runners
+    TotalRunningJobs       int  // jobs currently executing
+    TotalRegisteredRunners int  // runners registered with GitHub
+    TotalBusyRunners       int  // runners currently running a job
+    TotalIdleRunners       int  // runners waiting for a job
+}
+```
+
+`TotalAssignedJobs >= TotalRunningJobs`. Use `TotalAssignedJobs` for scaling, NOT individual message counts (messages are capped at 50 per batch).
+
+---
+
+## 14. Long-polling mechanics
+
+- `GetMessage` uses HTTP long-polling (~50s server-side timeout)
+- HTTP 200 = messages available (returned immediately)
+- HTTP 202 = no messages (timeout, returns nil,nil)
+- `lastMessageId` query param prevents reprocessing
+- `X-ScaleSetMaxCapacity` header tells server your capacity
+- Messages not ack'd (DeleteMessage) are redelivered
+- Job reassignment: jobs can appear as JobAssigned → JobCompleted(Cancelled) up to 3 times with incremental delays
+
+---
+
+## 15. Concurrency model
+
+- `Client.mu sync.Mutex` — every public method acquires it
+- `MessageSessionClient.mu sync.Mutex` — separate mutex, every public method acquires it
+- `Listener.maxRunners atomic.Uint32` — SetMaxRunners is lock-free
+- When MessageSessionClient needs the parent Client (for token refresh), it explicitly acquires innerClient.mu
+
+---
+
+## 16. Known limitations
+
+1. **Public Preview** — interfaces may change
+2. **Go 1.25+ required**
+3. **Message batch cap of 50** — don't count individual messages for scaling
+4. **Silent label dropping on GHES < 3.21** without feature flag
+5. **Typo in API**: `WithRetryableHTTPClint` (missing 'e') — can't be fixed
+6. **HTTP defaults**: retryMax=4, retryWaitMax=30s, timeout=5min
+7. **All response bodies read into memory** (BOM-trimmed)
+8. **`GITHUB_ACTIONS_FORCE_GHES`** env var forces GHES mode (check existence, not value)
+
+---
+
+## 17. Dependencies
+
+| Package | Version | Role |
+|---|---|---|
+| golang-jwt/jwt/v4 | v4.5.2 | JWT signing/verification |
+| hashicorp/go-retryablehttp | v0.7.8 | HTTP retries |
+| google/uuid | v1.6.0 | Session IDs |
+| spf13/cobra | v1.10.2 | CLI framework (example) |
+| stretchr/testify | v1.11.1 | Testing |
diff --git a/.claude/skills/scaleset-sdk/references/macos-adaptation.md b/.claude/skills/scaleset-sdk/references/macos-adaptation.md
new file mode 100644
index 0000000..ca43e8f
--- /dev/null
+++ b/.claude/skills/scaleset-sdk/references/macos-adaptation.md
@@ -0,0 +1,200 @@
+# Adapting the Docker Example to macOS Process-Based Runners
+
+## What stays the same (no changes)
+
+Everything from the `scaleset` and `listener` packages is backend-agnostic:
+
+- Client creation (PAT or GitHub App)
+- Scale set CRUD
+- Message session + listener loop
+- `listener.Scaler` interface contract (same 3 methods, same semantics)
+- Scaling formula: `target = min(maxRunners, minRunners + count)`
+- HandleJobStarted: state transition (idle → busy)
+- Signal handling and shutdown flow
+- JIT config generation via `GenerateJitRunnerConfig`
+
+## What must change
+
+### 1. Replace Docker with exec.Command
+
+**Docker version:**
+```go
+c, _ := dockerClient.ContainerCreate(ctx, &container.Config{
+    Image: runnerImage,
+    User:  "runner",
+    Cmd:   []string{"/home/runner/run.sh"},
+    Env:   []string{"ACTIONS_RUNNER_INPUT_JITCONFIG=" + jit.EncodedJITConfig},
+}, nil, nil, nil, name)
+dockerClient.ContainerStart(ctx, c.ID, container.StartOptions{})
+```
+
+**macOS version:**
+```go
+cmd := exec.CommandContext(ctx, filepath.Join(workDir, "run.sh"))
+cmd.Dir = workDir
+cmd.Env = append(os.Environ(), "ACTIONS_RUNNER_INPUT_JITCONFIG="+jit.EncodedJITConfig)
+cmd.Stdout = os.Stdout
+cmd.Stderr = os.Stderr
+cmd.Start()
+```
+
+### 2. Runner state tracking
+
+**Docker version:**
+```go
+type runnerState struct {
+    mu   sync.Mutex
+    idle map[string]string  // name → containerID
+    busy map[string]string  // name → containerID
+}
+```
+
+**macOS version:**
+```go
+type runnerProcess struct {
+    cmd     *exec.Cmd
+    workDir string
+    pid     int
+}
+
+type runnerState struct {
+    mu   sync.Mutex
+    idle map[string]*runnerProcess  // name → process
+    busy map[string]*runnerProcess  // name → process
+}
+```
+
+### 3. HandleJobCompleted — cleanup
+
+**Docker version:**
+```go
+func (s *Scaler) HandleJobCompleted(ctx context.Context, jobInfo *scaleset.JobCompleted) error {
+    containerID := s.runners.markDone(jobInfo.RunnerName)
+    return s.dockerClient.ContainerRemove(ctx, containerID, container.RemoveOptions{Force: true})
+}
+```
+
+**macOS version:**
+```go
+func (s *Scaler) HandleJobCompleted(ctx context.Context, jobInfo *scaleset.JobCompleted) error {
+    proc := s.runners.markDone(jobInfo.RunnerName)
+    if proc.cmd.Process != nil {
+        _ = proc.cmd.Process.Kill()
+        _ = proc.cmd.Wait()
+    }
+    return os.RemoveAll(proc.workDir)
+}
+```
+
+### 4. Shutdown
+
+**Docker version:** `ContainerRemove(force: true)` for all containers.
+
+**macOS version:**
+```go
+func (s *Scaler) shutdown(ctx context.Context) {
+    s.runners.mu.Lock()
+    defer s.runners.mu.Unlock()
+    for name, proc := range s.runners.idle {
+        _ = proc.cmd.Process.Kill()
+        _ = proc.cmd.Wait()
+        _ = os.RemoveAll(proc.workDir)
+    }
+    for name, proc := range s.runners.busy {
+        _ = proc.cmd.Process.Kill()
+        _ = proc.cmd.Wait()
+        _ = os.RemoveAll(proc.workDir)
+    }
+    clear(s.runners.idle)
+    clear(s.runners.busy)
+}
+```
+
+### 5. Runner binary management
+
+Docker has the runner inside the image. On macOS you need:
+
+```go
+// Download + extract once at startup
+func (m *Manager) ensureRunnerBits(ctx context.Context, version string) (string, error) {
+    // Resolve "latest" → actual version via GitHub Releases API
+    // Download https://github.com/actions/runner/releases/download/v{ver}/actions-runner-osx-{arch}-{ver}.tar.gz
+    // Extract to cacheDir/{version}/
+    // Return path to extracted directory
+}
+
+// Copy cached bits to each runner's workdir
+func (m *Manager) prepareWorkdir(baseDir, runnerID string) (string, error) {
+    workDir := filepath.Join(baseDir, runnerID)
+    os.MkdirAll(workDir, 0o755)
+    copyDir(cachedRunnerDir, workDir)
+    return workDir, nil
+}
+```
+
+### 6. Config changes
+
+Remove:
+- `RunnerImage` field
+
+Add:
+- `RunnerVersion` (string: "latest" or pinned like "2.330.0")
+- `CacheDir` (path for cached runner binaries)
+- `WorkdirBase` (base path for runner workdirs)
+
+## Complete startRunner for macOS
+
+```go
+func (s *Scaler) startRunner(ctx context.Context) (string, error) {
+    name := fmt.Sprintf("runner-%s", randHex(4))
+
+    // 1. Generate JIT config
+    jit, err := s.scalesetClient.GenerateJitRunnerConfig(ctx,
+        &scaleset.RunnerScaleSetJitRunnerSetting{Name: name},
+        s.scaleSetID,
+    )
+    if err != nil {
+        return "", fmt.Errorf("generate JIT config: %w", err)
+    }
+
+    // 2. Prepare workdir (copy cached runner bits)
+    workDir, err := s.manager.prepareWorkdir(s.workdirBase, name)
+    if err != nil {
+        return "", fmt.Errorf("prepare workdir: %w", err)
+    }
+
+    // 3. Start runner process with JIT config
+    cmd := exec.CommandContext(ctx, filepath.Join(workDir, "run.sh"))
+    cmd.Dir = workDir
+    cmd.Env = append(os.Environ(), "ACTIONS_RUNNER_INPUT_JITCONFIG="+jit.EncodedJITConfig)
+    cmd.Stdout = os.Stdout
+    cmd.Stderr = os.Stderr
+
+    if err := cmd.Start(); err != nil {
+        _ = os.RemoveAll(workDir)
+        return "", fmt.Errorf("start runner: %w", err)
+    }
+
+    // 4. Track
+    s.runners.addIdle(name, &runnerProcess{
+        cmd:     cmd,
+        workDir: workDir,
+        pid:     cmd.Process.Pid,
+    })
+
+    return name, nil
+}
+```
+
+## Architecture comparison
+
+| Layer | Docker | macOS |
+|---|---|---|
+| Runner backend | Container | exec.Cmd process |
+| Config delivery | JITCONFIG env var | Same env var |
+| State tracking | name → containerID | name → *runnerProcess |
+| Scale up | ContainerCreate + Start | exec.Command + Start |
+| Scale down | ContainerRemove(force) | Kill + Wait + RemoveAll |
+| Shutdown | Force remove all | Kill all + cleanup dirs |
+| Isolation | Container | Filesystem (workdirs) |
+| Runner binary | Inside Docker image | Downloaded + cached |
diff --git a/.github/ISSUE_TEMPLATE/bug_report.yml b/.github/ISSUE_TEMPLATE/bug_report.yml
new file mode 100644
index 0000000..fd6bdda
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/bug_report.yml
@@ -0,0 +1,59 @@
+name: Bug Report
+description: Report a bug
+labels: [bug]
+body:
+  - type: textarea
+    id: description
+    attributes:
+      label: Description
+      description: What happened?
+    validations:
+      required: true
+  - type: textarea
+    id: expected
+    attributes:
+      label: Expected behavior
+      description: What did you expect to happen?
+    validations:
+      required: true
+  - type: textarea
+    id: reproduce
+    attributes:
+      label: Steps to reproduce
+      description: How can we reproduce this?
+    validations:
+      required: true
+  - type: input
+    id: version
+    attributes:
+      label: ghr version
+      description: Output of `ghr version`
+      placeholder: "e.g. ghr 1.0.0 (commit: abc1234, built: 2026-01-01T00:00:00Z)"
+    validations:
+      required: true
+  - type: dropdown
+    id: os
+    attributes:
+      label: macOS version
+      options:
+        - macOS 15 (Sequoia)
+        - macOS 14 (Sonoma)
+        - macOS 13 (Ventura)
+        - Other
+    validations:
+      required: true
+  - type: dropdown
+    id: arch
+    attributes:
+      label: Architecture
+      options:
+        - Apple Silicon (arm64)
+        - Intel (amd64)
+    validations:
+      required: true
+  - type: textarea
+    id: logs
+    attributes:
+      label: Relevant logs
+      description: Paste any relevant log output
+      render: shell
diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml
new file mode 100644
index 0000000..3ba13e0
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/config.yml
@@ -0,0 +1 @@
+blank_issues_enabled: false
diff --git a/.github/ISSUE_TEMPLATE/feature_request.yml b/.github/ISSUE_TEMPLATE/feature_request.yml
new file mode 100644
index 0000000..fba5a47
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/feature_request.yml
@@ -0,0 +1,23 @@
+name: Feature Request
+description: Suggest a feature
+labels: [enhancement]
+body:
+  - type: textarea
+    id: problem
+    attributes:
+      label: Problem
+      description: What problem does this solve?
+    validations:
+      required: true
+  - type: textarea
+    id: solution
+    attributes:
+      label: Proposed solution
+      description: How would you like it to work?
+    validations:
+      required: true
+  - type: textarea
+    id: alternatives
+    attributes:
+      label: Alternatives considered
+      description: Any other approaches you've thought about?
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
new file mode 100644
index 0000000..fbc93bc
--- /dev/null
+++ b/.github/workflows/ci.yml
@@ -0,0 +1,69 @@
+name: CI
+
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+
+concurrency:
+  group: ci-${{ github.ref }}
+  cancel-in-progress: true
+
+jobs:
+  lint:
+    name: Lint
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
+      - uses: actions/setup-go@d35c59abb061a4a6fb18e82ac0862c26744d6ab5 # v5.5.0
+        with:
+          go-version-file: go.mod
+      - uses: golangci/golangci-lint-action@4afd733a84b1f43292c63897423277bb7f4313a9 # v8.0.0
+        with:
+          version: latest
+
+  vet:
+    name: Vet & Format
+    runs-on: ubuntu-latest
+    needs: lint
+    steps:
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
+      - uses: actions/setup-go@d35c59abb061a4a6fb18e82ac0862c26744d6ab5 # v5.5.0
+        with:
+          go-version-file: go.mod
+      - run: go vet ./...
+      - run: make fmt-check
+
+  build:
+    name: Build
+    runs-on: ubuntu-latest
+    needs: vet
+    strategy:
+      matrix:
+        goos: [darwin, linux]
+        goarch: [amd64, arm64]
+    steps:
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
+      - uses: actions/setup-go@d35c59abb061a4a6fb18e82ac0862c26744d6ab5 # v5.5.0
+        with:
+          go-version-file: go.mod
+      - run: make build
+        env:
+          GOOS: ${{ matrix.goos }}
+          GOARCH: ${{ matrix.goarch }}
+          CGO_ENABLED: "0"
+
+  test:
+    name: Test
+    runs-on: ${{ matrix.os }}
+    needs: build
+    strategy:
+      matrix:
+        os: [macos-latest, ubuntu-latest]
+    steps:
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
+      - uses: actions/setup-go@d35c59abb061a4a6fb18e82ac0862c26744d6ab5 # v5.5.0
+        with:
+          go-version-file: go.mod
+      - run: make test
diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml
new file mode 100644
index 0000000..f11f713
--- /dev/null
+++ b/.github/workflows/release.yml
@@ -0,0 +1,27 @@
+name: Release
+
+on:
+  push:
+    tags:
+      - "v*.*.*"
+
+permissions:
+  contents: write
+
+jobs:
+  release:
+    name: Release
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
+        with:
+          fetch-depth: 0
+      - uses: actions/setup-go@d35c59abb061a4a6fb18e82ac0862c26744d6ab5 # v5.5.0
+        with:
+          go-version-file: go.mod
+      - uses: goreleaser/goreleaser-action@9ed2f89a662bf1735a48bc8557fd212fa902bebf # v6.3.0
+        with:
+          version: latest
+          args: release --clean
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
diff --git a/.gitignore b/.gitignore
index 98fe80e..47bf342 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,21 +1,43 @@
-.serena
-*.md
-.DS_Store
-.gocache/
-.env
-.env.*
-config.yaml
-*.log
-*.out
-*.test
-coverage.*
+# Binary
+/ghr
+/ghr.exe
+/v2/ghr
+/v2/ghr.exe
+
+# Build
 bin/
 dist/
-ghr.exe
-**/*.exe
-**/*.dll
-**/*.so
-**/*.dylib
+
+# Go
+*.test
+*.out
+coverage.*
+
+# Environment & secrets
+.env
+.env.*
+credentials.json
+*.pem
+
+# OS
+.DS_Store
+Thumbs.db
+
+# IDE
 .vscode/
 .idea/
-ghr
+*.swp
+*.swo
+*~
+
+# Claude Code
+.claude/settings.local.json
+.serena
+
+# Logs (local dev)
+*.log
+
+# Runner workdirs (local dev)
+runners/
+cache/
+state/
diff --git a/.golangci.yml b/.golangci.yml
new file mode 100644
index 0000000..32a6311
--- /dev/null
+++ b/.golangci.yml
@@ -0,0 +1,86 @@
+version: "2"
+
+run:
+  timeout: 5m
+
+linters:
+  enable:
+    - errcheck
+    - govet
+    - ineffassign
+    - staticcheck
+    - unused
+    - gocritic
+    - misspell
+    - nolintlint
+    - prealloc
+    - revive
+    - unconvert
+    - unparam
+    - errorlint
+    - bodyclose
+    - contextcheck
+    - nilerr
+    - exhaustive
+
+  settings:
+    errcheck:
+      exclude-functions:
+        - (net/http.ResponseWriter).Write
+        - (*log/slog.Logger).Info
+        - (*log/slog.Logger).Debug
+        - (*log/slog.Logger).Warn
+        - (*log/slog.Logger).Error
+
+    gocritic:
+      enabled-tags:
+        - diagnostic
+        - style
+        - performance
+
+    revive:
+      rules:
+        - name: blank-imports
+        - name: context-as-argument
+        - name: context-keys-type
+        - name: error-return
+        - name: error-strings
+        - name: error-naming
+        - name: if-return
+        - name: increment-decrement
+        - name: var-naming
+        - name: range
+        - name: receiver-naming
+        - name: time-naming
+        - name: unexported-return
+        - name: indent-error-flow
+        - name: errorf
+        - name: empty-block
+        - name: superfluous-else
+        - name: unused-parameter
+        - name: unreachable-code
+
+    exhaustive:
+      default-signifies-exhaustive: true
+
+  exclusions:
+    presets:
+      - std-error-handling
+    rules:
+      - path: _test\.go
+        linters:
+          - errcheck
+          - gocritic
+          - unparam
+          - revive
+      - linters:
+          - gocritic
+        text: "hugeParam: r is heavy"
+        path: internal/logging/handler\.go
+      - linters:
+          - gocritic
+        text: "filepathJoin"
+        path: internal/launchd/
+      - linters:
+          - nilerr
+        path: internal/cli/auth\.go
diff --git a/.goreleaser.yml b/.goreleaser.yml
new file mode 100644
index 0000000..62a4b70
--- /dev/null
+++ b/.goreleaser.yml
@@ -0,0 +1,49 @@
+version: 2
+
+builds:
+  - main: ./cmd/ghr
+    binary: ghr
+    env:
+      - CGO_ENABLED=0
+    goos:
+      - darwin
+      - linux
+    goarch:
+      - amd64
+      - arm64
+    ldflags:
+      - -s -w
+      - -X '{{ .ModulePath }}/internal/cli.version={{ .Version }}'
+      - -X '{{ .ModulePath }}/internal/cli.commit={{ .ShortCommit }}'
+      - -X '{{ .ModulePath }}/internal/cli.date={{ .Date }}'
+
+archives:
+  - formats: [tar.gz]
+    name_template: "{{ .ProjectName }}_{{ .Version }}_{{ .Os }}_{{ .Arch }}"
+
+checksum:
+  name_template: checksums.txt
+
+changelog:
+  sort: asc
+  filters:
+    exclude:
+      - "^docs:"
+      - "^style:"
+      - "^chore\\(deps\\):"
+  groups:
+    - title: Features
+      regexp: "^feat"
+    - title: Bug Fixes
+      regexp: "^fix"
+    - title: Refactoring
+      regexp: "^refactor"
+    - title: Other
+      order: 999
+
+release:
+  github:
+    owner: RedBoardDev
+    name: gh-runners-tool
+  prerelease: auto
+  name_template: "v{{ .Version }}"
diff --git a/CLAUDE.md b/CLAUDE.md
new file mode 100644
index 0000000..8a6e6f2
--- /dev/null
+++ b/CLAUDE.md
@@ -0,0 +1,74 @@
+# ghr — GitHub Actions Runner Controller for macOS
+
+## Project
+
+Self-hosted GitHub Actions runner controller built on the official `actions/scaleset` Go SDK. Manages ephemeral runners via JIT configs, scale sets, and long-polling. Targets macOS (Apple Silicon + Intel).
+
+## Quick Reference
+
+```bash
+go build ./cmd/ghr              # build
+go test ./...                    # test all
+go test -race ./...              # test with race detector
+go vet ./...                     # static analysis
+gofmt -w .                       # format
+golangci-lint run                # lint (if installed)
+```
+
+## Architecture
+
+Package-by-feature under `internal/`. No DDD layers. See `specs/00-architecture.md`.
+
+```
+cmd/ghr/main.go       → wiring, DI, CLI
+internal/cli/          → Cobra commands (thin)
+internal/auth/         → credentials (login, load, save)
+internal/config/       → YAML + env loading
+internal/controller/   → scale set orchestration + Scaler
+internal/runner/       → binary download + process management
+internal/github/       → scaleset SDK adapter
+internal/health/       → health monitoring
+internal/notification/ → event-driven alerts (Discord, webhooks)
+internal/monitoring/   → push-based reporters (Uptime Kuma)
+internal/api/          → Unix socket JSON API (IPC for ghr status)
+internal/launchd/      → macOS service management
+internal/logging/      → slog multi-writer, rotation
+internal/model/        → shared structs only (no interfaces, no logic)
+```
+
+## Code Conventions
+
+- Go 1.25+ required (for actions/scaleset SDK)
+- Interfaces defined where consumed, not where implemented
+- Structs with exported fields, not getter interfaces
+- Error wrapping: `fmt.Errorf("context: %w", err)`
+- `oklog/run` for daemon goroutine lifecycle
+- `context.Context` as first param everywhere
+- No `any` without justification, no `_` to ignore errors
+- Table-driven tests with `t.Run` subtests
+
+## Commit Convention
+
+`type(scope): description` — types: feat, fix, docs, refactor, test, chore
+
+## Key Dependencies
+
+- `github.com/actions/scaleset` — Scale Set API + listener
+- `github.com/spf13/cobra` — CLI
+- `github.com/oklog/run` — goroutine lifecycle
+- `github.com/joho/godotenv` — .env loading
+- `gopkg.in/yaml.v3` — config
+- `log/slog` (stdlib) — structured logging
+
+## Specs
+
+All specs in `specs/`. Read before implementing:
+- `00-architecture.md` — package structure, interfaces, DI wiring
+- `01-core-scaleset.md` — scale set engine, scaler, runner manager
+- `02-cli-commands.md` — start/stop/run/status/purge/login
+- `03-health-monitor.md` — health checks, issue detection
+- `04-logging.md` — structured logging, rotation, per-runner files
+- `05-notifications.md` — Discord, webhook providers
+- `06-uptime-kuma.md` — push monitoring
+- `07-config.md` — YAML schema, validation, defaults
+- `08-auth.md` — login wizard, credentials file, resolution order
diff --git a/Makefile b/Makefile
new file mode 100644
index 0000000..c30c84d
--- /dev/null
+++ b/Makefile
@@ -0,0 +1,46 @@
+BINARY    := ghr
+MODULE    := github.com/RedBoardDev/gh-runners-tool/v2
+CMD       := ./cmd/ghr
+VERSION   ?= $(shell git describe --tags --always --dirty 2>/dev/null || echo "dev")
+COMMIT    ?= $(shell git rev-parse --short HEAD 2>/dev/null || echo "none")
+DATE      ?= $(shell date -u '+%Y-%m-%dT%H:%M:%SZ')
+LDFLAGS   := -s -w -X '$(MODULE)/internal/cli.version=$(VERSION)' -X '$(MODULE)/internal/cli.commit=$(COMMIT)' -X '$(MODULE)/internal/cli.date=$(DATE)'
+
+.PHONY: build test lint vet fmt fmt-check vuln clean install snapshot ci help
+
+build: ## Build the binary
+	go build -ldflags "$(LDFLAGS)" -o $(BINARY) $(CMD)
+
+test: ## Run tests with race detector
+	go test -race -count=1 ./...
+
+lint: ## Run golangci-lint
+	golangci-lint run
+
+vet: ## Run go vet
+	go vet ./...
+
+fmt: ## Format code
+	gofmt -w .
+
+fmt-check: ## Check formatting (CI)
+	@test -z "$$(gofmt -l .)" || (echo "Files not formatted:" && gofmt -l . && exit 1)
+
+vuln: ## Run govulncheck
+	govulncheck ./...
+
+clean: ## Remove build artifacts
+	rm -rf $(BINARY) dist/
+
+install: ## Install locally via go install
+	go install -ldflags "$(LDFLAGS)" $(CMD)
+
+snapshot: ## Build a snapshot release (no publish)
+	goreleaser release --snapshot --clean
+
+ci: lint vet fmt-check build test vuln ## Run all CI checks locally
+
+help: ## Show this help
+	@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | sort | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m%-15s\033[0m %s\n", $$1, $$2}'
+
+.DEFAULT_GOAL := help
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..4ea81ad
--- /dev/null
+++ b/README.md
@@ -0,0 +1,138 @@
+# ghr - GitHub Actions Runner Controller for macOS
+
+[![Go](https://img.shields.io/badge/Go-1.25+-00ADD8?logo=go&logoColor=white)](https://go.dev/)
+[![GitHub Actions](https://img.shields.io/badge/GitHub%20Actions-Runner%20Controller-2088FF?logo=githubactions&logoColor=white)](https://github.com/features/actions)
+[![macOS](https://img.shields.io/badge/macOS-Apple%20Silicon%20%7C%20Intel-000000?logo=apple&logoColor=white)](https://www.apple.com/macos/)
+
+## Overview
+
+**ghr** is a self-hosted GitHub Actions runner controller built on the official [`actions/scaleset`](https://github.com/actions/scaleset) Go SDK. It manages ephemeral runners via JIT configs, scale sets, and long-polling - targeting macOS (Apple Silicon and Intel).
+
+Define runner groups with min/max scaling in a YAML config, and ghr handles binary downloads, runner registration, process lifecycle, health monitoring, and graceful shutdown. It integrates with macOS `launchd` for service management and supports Discord/webhook notifications and Uptime Kuma push monitoring.
+
+### Key Features
+
+- **Scale Set orchestration** - Runner groups with configurable min/max scaling via the official GitHub SDK
+- **Ephemeral JIT runners** - Provisioned on-demand with just-in-time configs, cleaned up after each job
+- **macOS native** - First-class `launchd` integration (`ghr start/stop/restart/status`)
+- **YAML configuration** - Single config file with environment variables for secrets
+- **Health monitoring** - Detection of stuck runners, resource issues, and connectivity problems
+- **Notifications** - Discord and webhook alerts for runner events
+- **Uptime Kuma** - Push-based monitoring integration
+- **Structured logging** - `slog`-based with file rotation and per-runner log files
+
+## Getting Started
+
+### Prerequisites
+
+- **Go 1.25+** (required by the `actions/scaleset` SDK)
+- **macOS** (Apple Silicon or Intel)
+- A GitHub organization or repository with self-hosted runner access
+- A GitHub PAT or App credentials with runner management permissions
+
+### Build
+
+```bash
+go build -o ghr ./cmd/ghr
+```
+
+### Configuration
+
+Create a `config.yaml`:
+
+```yaml
+github:
+  url: "https://github.com/my-org"
+  runner_group: "default"
+
+runner:
+  version: "latest"
+  cache_dir: "/var/lib/ghr/cache"
+  workdir_base: "/var/lib/ghr/runners"
+
+groups:
+  - name: "ci-runners"
+    max_runners: 10
+    min_runners: 2
+    labels: ["ci", "macos"]
+
+  - name: "deploy-runners"
+    max_runners: 2
+    labels: ["deploy", "macos"]
+```
+
+Authentication is handled via `ghr login`. Tokens are never stored in the config file - use environment variables or the credentials store.
+
+### Usage
+
+```bash
+# Authenticate with GitHub
+ghr login
+
+# Start as a launchd service (daemon)
+ghr start --config config.yaml
+
+# Run in foreground (debug mode)
+ghr run --config config.yaml
+
+# Check status
+ghr status
+
+# Restart after config changes
+ghr restart
+
+# Stop the daemon
+ghr stop
+
+# Emergency reset (kill all runners, clean workdirs)
+ghr purge
+```
+
+### Run Tests
+
+```bash
+go test ./...              # all tests
+go test -race ./...        # with race detector
+go vet ./...               # static analysis
+golangci-lint run          # lint (if installed)
+```
+
+## Repository Structure
+
+```
+ghr/
+├── cmd/ghr/main.go            # Entrypoint
+├── internal/
+│   ├── cli/                    # Cobra commands
+│   ├── auth/                   # Credentials management
+│   ├── config/                 # YAML + env config
+│   ├── runner/                 # Binary download & process lifecycle
+│   ├── github/                 # Scale set SDK adapter
+│   ├── model/                  # Shared data structs
+│   └── logging/                # Structured logging
+├── go.mod
+└── go.sum
+```
+
+## Key Dependencies
+
+| Package | Purpose |
+|---------|---------|
+| [`actions/scaleset`](https://github.com/actions/scaleset) | Official GitHub Scale Set API + listener |
+| [`spf13/cobra`](https://github.com/spf13/cobra) | CLI framework |
+| [`oklog/run`](https://github.com/oklog/run) | Goroutine lifecycle management |
+| [`joho/godotenv`](https://github.com/joho/godotenv) | `.env` file loading |
+| `gopkg.in/yaml.v3` | YAML config parsing |
+| `log/slog` (stdlib) | Structured logging |
+
+## Reporting Issues
+
+[GitHub Issues](https://github.com/RedBoardDev/gh-runners-tool/issues)
+
+## License
+
+Proprietary. All rights reserved.
+
+## Contact
+
+- GitHub: [@RedBoardDev](https://github.com/RedBoardDev)
diff --git a/VERSION.md b/VERSION.md
new file mode 100644
index 0000000..892397a
--- /dev/null
+++ b/VERSION.md
@@ -0,0 +1,41 @@
+# Versioning & Releases
+
+ghr uses git tags as the single source of truth for versioning. No version file to maintain.
+
+## Creating a release
+
+```bash
+git tag v1.0.0
+git push --tags
+```
+
+This triggers the release workflow which builds binaries for darwin/linux (amd64 + arm64) and publishes a GitHub Release with archives and checksums.
+
+## Version format
+
+Follow [semver](https://semver.org):
+
+| Tag | When |
+|---|---|
+| `v1.0.0` | First stable release |
+| `v1.1.0` | New feature, backward compatible |
+| `v1.0.1` | Bug fix |
+| `v2.0.0` | Breaking change |
+| `v1.0.0-rc.1` | Pre-release (marked automatically) |
+
+## How it works
+
+The version is injected at build time via Go ldflags. The Makefile and GoReleaser both inject `version`, `commit`, and `date` into the binary. Running `ghr version` prints these values.
+
+When building manually without ldflags (`go build ./cmd/ghr`), the version defaults to `dev`.
+
+## Fixing a bad tag
+
+```bash
+git tag -d v1.0.0                   # delete locally
+git push --delete origin v1.0.0     # delete on remote
+git tag v1.0.1                      # create correct tag
+git push --tags
+```
+
+Delete the corresponding GitHub Release manually if it was already published.
diff --git a/cmd/ghr/main.go b/cmd/ghr/main.go
new file mode 100644
index 0000000..731efc3
--- /dev/null
+++ b/cmd/ghr/main.go
@@ -0,0 +1,13 @@
+package main
+
+import (
+	"os"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/cli"
+)
+
+func main() {
+	if err := cli.Execute(); err != nil {
+		os.Exit(1)
+	}
+}
diff --git a/config.example.yaml b/config.example.yaml
new file mode 100644
index 0000000..3651690
--- /dev/null
+++ b/config.example.yaml
@@ -0,0 +1,52 @@
+github:
+  url: "https://github.com/my-org"
+  runner_group: "default"
+
+runner:
+  version: "latest"
+  cache_dir: ""
+  workdir_base: ""
+
+groups:
+  - name: "runners"
+    max_runners: 5
+    min_runners: 0
+    labels:
+      - "macos"
+
+health:
+  enabled: true
+  check_interval: "30s"
+  runner_timeout: "2h"
+  idle_timeout: "0"
+  divergence_timeout: "5m"
+  max_consecutive_failures: 5
+  failure_cooldown: "1m"
+  min_disk_space: "1GB"
+
+logging:
+  level: "info"
+  format: "text"
+  dir: ""
+  retention_days: 30
+  runner_output: true
+
+notifications:
+  discord:
+    enabled: false
+    events: []
+    username: "ghr"
+    avatar_url: ""
+    mentions:
+      error: ""
+      critical: ""
+
+monitoring:
+  uptime_kuma:
+    enabled: false
+    degraded_threshold: 0.5
+    report_health_as_ping: true
+
+daemon:
+  state_dir: ""
+  shutdown_timeout: "30s"
diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md
deleted file mode 100644
index 90e0ba5..0000000
--- a/docs/ARCHITECTURE.md
+++ /dev/null
@@ -1,36 +0,0 @@
-# Architecture Overview
-
-## Packages
-- `cmd/ghr`: entrypoint, wires CLI.
-- `internal/cli`: cobra commands (`daemon`, `apply`, `status`), config flag handling, pid file utilities.
-- `internal/config`: YAML + `.env` loading/validation, defaults for paths/version.
-- `internal/domain`: core domain structs for groups and runner instances.
-- `internal/provider/github`: GitHub API client for runner registration tokens.
-- `internal/runner`: runner lifecycle (download cache, per-runner copy, configure, launch, cleanup).
-- `internal/reconciler`: converges desired groups to running runners; watches exits and scales up/down.
-- `internal/logging`: basic stdout logger.
-
-## Data Paths
-- Cache: `/var/lib/ghr/cache` (runner archives/extracted bits).
-- Workdirs: `/var/lib/ghr/groups/<group>/<id>` (per runner, cleaned on exit).
-- State (pid): `/var/lib/ghr/state/daemon.pid`.
-- Runner pid files: `<workdir>/.ghr-pid` (used for cleanup on startup).
-
-## Control Flow
-1. `ghr daemon --config config.yaml` loads config, creates GitHub client + runner manager + reconciler.
-2. Daemon writes pid file, starts reconcile loop on interval (default 15s).
-3. SIGHUP triggers config reload; reconcile loop also reaps finished runners and recreates ephemerals to maintain counts.
-4. `ghr apply` validates config and sends SIGHUP to daemon to reload.
-5. On startup, daemon calls runner cleanup to kill any stray processes found in configured workdir bases and removes their workdirs to avoid accumulation.
-
-## Runner Lifecycle
-1. Resolve runner version (`latest` via GitHub releases) and download/archive cache if missing.
-2. Copy cached bits to a fresh workdir per runner; run `config.sh --unattended --url ... --token ... [--labels] [--ephemeral]`.
-3. Start `run.sh`; wait/observe exit; cleanup workdir after exit.
-4. Reconciler detects exits and scales replacements for ephemeral groups to keep target counts.
-
-## Security Notes
-- Tokens only via env (`GITHUB_TOKEN`/`GITHUB_PAT`), never in config.
-- Cleanup removes workdirs after runner exit; no per-group users to keep complexity low.
-- macOS-only target; Linux best-effort later.
-
diff --git a/env.example b/env.example
new file mode 100644
index 0000000..805da0f
--- /dev/null
+++ b/env.example
@@ -0,0 +1,13 @@
+# GitHub authentication (alternative to 'ghr login')
+# GITHUB_TOKEN=ghp_xxxxxxxxxxxx
+
+# Discord notifications
+# GHR_DISCORD_WEBHOOK_URL=https://discord.com/api/webhooks/xxx/yyy
+
+# Uptime Kuma monitoring
+# GHR_UPTIME_KUMA_URL=https://uptime.example.com
+# GHR_UPTIME_KUMA_DAEMON_TOKEN=your-daemon-push-token
+# GHR_UPTIME_KUMA_TOKEN_RUNNERS=your-group-push-token
+
+# Override credentials file path
+# GHR_CREDENTIALS_FILE=/path/to/credentials.json
diff --git a/go.mod b/go.mod
new file mode 100644
index 0000000..cca8538
--- /dev/null
+++ b/go.mod
@@ -0,0 +1,17 @@
+module github.com/RedBoardDev/gh-runners-tool/v2
+
+go 1.25.3
+
+require (
+	github.com/actions/scaleset v0.4.0 // indirect
+	github.com/golang-jwt/jwt/v4 v4.5.2 // indirect
+	github.com/google/uuid v1.6.0 // indirect
+	github.com/hashicorp/go-cleanhttp v0.5.2 // indirect
+	github.com/hashicorp/go-retryablehttp v0.7.8 // indirect
+	github.com/inconshreveable/mousetrap v1.1.0 // indirect
+	github.com/joho/godotenv v1.5.1 // indirect
+	github.com/oklog/run v1.2.0 // indirect
+	github.com/spf13/cobra v1.10.2 // indirect
+	github.com/spf13/pflag v1.0.10 // indirect
+	gopkg.in/yaml.v3 v3.0.1 // indirect
+)
diff --git a/go.sum b/go.sum
new file mode 100644
index 0000000..34572d1
--- /dev/null
+++ b/go.sum
@@ -0,0 +1,27 @@
+github.com/actions/scaleset v0.4.0 h1:691GC2AkHb3ZGjfNvatboYoRS7CLr3+4VcZk/6w9IbM=
+github.com/actions/scaleset v0.4.0/go.mod h1:2L2I6rggFWV+zprDet6y7y7Vkm3HPudaup78eSc79Uo=
+github.com/cpuguy83/go-md2man/v2 v2.0.6/go.mod h1:oOW0eioCTA6cOiMLiUPZOpcVxMig6NIQQ7OS05n1F4g=
+github.com/golang-jwt/jwt/v4 v4.5.2 h1:YtQM7lnr8iZ+j5q71MGKkNw9Mn7AjHM68uc9g5fXeUI=
+github.com/golang-jwt/jwt/v4 v4.5.2/go.mod h1:m21LjoU+eqJr34lmDMbreY2eSTRJ1cv77w39/MY0Ch0=
+github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
+github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
+github.com/hashicorp/go-cleanhttp v0.5.2 h1:035FKYIWjmULyFRBKPs8TBQoi0x6d9G4xc9neXJWAZQ=
+github.com/hashicorp/go-cleanhttp v0.5.2/go.mod h1:kO/YDlP8L1346E6Sodw+PrpBSV4/SoxCXGY6BqNFT48=
+github.com/hashicorp/go-retryablehttp v0.7.8 h1:ylXZWnqa7Lhqpk0L1P1LzDtGcCR0rPVUrx/c8Unxc48=
+github.com/hashicorp/go-retryablehttp v0.7.8/go.mod h1:rjiScheydd+CxvumBsIrFKlx3iS0jrZ7LvzFGFmuKbw=
+github.com/inconshreveable/mousetrap v1.1.0 h1:wN+x4NVGpMsO7ErUn/mUI3vEoE6Jt13X2s0bqwp9tc8=
+github.com/inconshreveable/mousetrap v1.1.0/go.mod h1:vpF70FUmC8bwa3OWnCshd2FqLfsEA9PFc4w1p2J65bw=
+github.com/joho/godotenv v1.5.1 h1:7eLL/+HRGLY0ldzfGMeQkb7vMd0as4CfYvUVzLqw0N0=
+github.com/joho/godotenv v1.5.1/go.mod h1:f4LDr5Voq0i2e/R5DDNOoa2zzDfwtkZa6DnEwAbqwq4=
+github.com/oklog/run v1.2.0 h1:O8x3yXwah4A73hJdlrwo/2X6J62gE5qTMusH0dvz60E=
+github.com/oklog/run v1.2.0/go.mod h1:mgDbKRSwPhJfesJ4PntqFUbKQRZ50NgmZTSPlFA0YFk=
+github.com/russross/blackfriday/v2 v2.1.0/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQDYRxCVz55jmeOWTM=
+github.com/spf13/cobra v1.10.2 h1:DMTTonx5m65Ic0GOoRY2c16WCbHxOOw6xxezuLaBpcU=
+github.com/spf13/cobra v1.10.2/go.mod h1:7C1pvHqHw5A4vrJfjNwvOdzYu0Gml16OCs2GRiTUUS4=
+github.com/spf13/pflag v1.0.9/go.mod h1:McXfInJRrz4CZXVZOBLb0bTZqETkiAhM9Iw0y3An2Bg=
+github.com/spf13/pflag v1.0.10 h1:4EBh2KAYBwaONj6b2Ye1GiHfwjqyROoF4RwYO+vPwFk=
+github.com/spf13/pflag v1.0.10/go.mod h1:McXfInJRrz4CZXVZOBLb0bTZqETkiAhM9Iw0y3An2Bg=
+go.yaml.in/yaml/v3 v3.0.4/go.mod h1:DhzuOOF2ATzADvBadXxruRBLzYTpT36CKvDb3+aBEFg=
+gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
+gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
+gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
diff --git a/internal/api/handlers.go b/internal/api/handlers.go
new file mode 100644
index 0000000..51806c6
--- /dev/null
+++ b/internal/api/handlers.go
@@ -0,0 +1,67 @@
+package api
+
+import (
+	"encoding/json"
+	"net/http"
+	"time"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/model"
+)
+
+type statusResponse struct {
+	Groups map[string][]model.RunnerSnapshot `json:"groups"`
+	Health healthResponse                    `json:"health"`
+}
+
+type healthResponse struct {
+	LastCheck time.Time           `json:"last_check"`
+	Issues    []model.HealthIssue `json:"issues"`
+}
+
+func (s *Server) routes() http.Handler {
+	mux := http.NewServeMux()
+	mux.HandleFunc("GET /status", s.handleStatus)
+	mux.HandleFunc("GET /health", s.handleHealth)
+	return mux
+}
+
+func (s *Server) handleStatus(w http.ResponseWriter, _ *http.Request) {
+	snapshots := s.controller.Snapshots()
+	hs := s.health.Status()
+
+	resp := statusResponse{
+		Groups: snapshots,
+		Health: healthResponse{
+			LastCheck: hs.LastCheck,
+			Issues:    hs.Issues,
+		},
+	}
+
+	writeJSON(w, resp)
+}
+
+func (s *Server) handleHealth(w http.ResponseWriter, _ *http.Request) {
+	hs := s.health.Status()
+
+	resp := healthResponse{
+		LastCheck: hs.LastCheck,
+		Issues:    hs.Issues,
+	}
+
+	writeJSON(w, resp)
+}
+
+func writeJSON(w http.ResponseWriter, v any) {
+	w.Header().Set("Content-Type", "application/json")
+
+	data, err := json.Marshal(v)
+	if err != nil {
+		w.WriteHeader(http.StatusInternalServerError)
+		return
+	}
+
+	_, writeErr := w.Write(data)
+	if writeErr != nil {
+		return
+	}
+}
diff --git a/internal/api/server.go b/internal/api/server.go
new file mode 100644
index 0000000..6b7f2fa
--- /dev/null
+++ b/internal/api/server.go
@@ -0,0 +1,97 @@
+package api
+
+import (
+	"context"
+	"errors"
+	"fmt"
+	"log/slog"
+	"net"
+	"net/http"
+	"os"
+	"path/filepath"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/health"
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/model"
+)
+
+type controllerState interface {
+	Snapshots() map[string][]model.RunnerSnapshot
+}
+
+type healthState interface {
+	Status() health.HealthStatus
+}
+
+type Server struct {
+	socketPath string
+	controller controllerState
+	health     healthState
+	logger     *slog.Logger
+	listener   net.Listener
+}
+
+func NewServer(stateDir string, controller controllerState, healthProvider healthState, logger *slog.Logger) *Server {
+	return &Server{
+		socketPath: filepath.Join(stateDir, "ghr.sock"),
+		controller: controller,
+		health:     healthProvider,
+		logger:     logger,
+	}
+}
+
+func (s *Server) Run(ctx context.Context) error {
+	if err := removeStaleSocket(s.socketPath); err != nil {
+		return fmt.Errorf("remove stale socket: %w", err)
+	}
+
+	ln, err := net.Listen("unix", s.socketPath)
+	if err != nil {
+		return fmt.Errorf("listen on %s: %w", s.socketPath, err)
+	}
+	s.listener = ln
+
+	srv := &http.Server{
+		Handler: s.routes(),
+	}
+
+	errCh := make(chan error, 1)
+	go func() {
+		errCh <- srv.Serve(ln)
+	}()
+
+	select {
+	case <-ctx.Done():
+		shutdownErr := srv.Close()
+		cleanupErr := os.Remove(s.socketPath)
+		if shutdownErr != nil {
+			return fmt.Errorf("shutdown api server: %w", shutdownErr)
+		}
+		if cleanupErr != nil && !os.IsNotExist(cleanupErr) {
+			s.logger.Warn("failed to remove socket file", "path", s.socketPath, "error", cleanupErr)
+		}
+		return nil
+	case err := <-errCh:
+		cleanupErr := os.Remove(s.socketPath)
+		if cleanupErr != nil && !os.IsNotExist(cleanupErr) {
+			s.logger.Warn("failed to remove socket file", "path", s.socketPath, "error", cleanupErr)
+		}
+		if errors.Is(err, http.ErrServerClosed) {
+			return nil
+		}
+		return fmt.Errorf("api server: %w", err)
+	}
+}
+
+func removeStaleSocket(path string) error {
+	_, err := os.Stat(path)
+	if os.IsNotExist(err) {
+		return nil
+	}
+	if err != nil {
+		return fmt.Errorf("stat socket %s: %w", path, err)
+	}
+	if err := os.Remove(path); err != nil {
+		return fmt.Errorf("remove socket %s: %w", path, err)
+	}
+	return nil
+}
diff --git a/internal/api/server_test.go b/internal/api/server_test.go
new file mode 100644
index 0000000..b1780d6
--- /dev/null
+++ b/internal/api/server_test.go
@@ -0,0 +1,193 @@
+package api
+
+import (
+	"encoding/json"
+	"log/slog"
+	"net/http"
+	"net/http/httptest"
+	"os"
+	"testing"
+	"time"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/health"
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/model"
+)
+
+type mockController struct {
+	snapshots map[string][]model.RunnerSnapshot
+}
+
+func (m *mockController) Snapshots() map[string][]model.RunnerSnapshot {
+	return m.snapshots
+}
+
+type mockHealth struct {
+	status health.HealthStatus
+}
+
+func (m *mockHealth) Status() health.HealthStatus {
+	return m.status
+}
+
+func testServer(ctrl *mockController, h *mockHealth) *Server {
+	return &Server{
+		controller: ctrl,
+		health:     h,
+		logger:     slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{Level: slog.LevelError + 1})),
+	}
+}
+
+func TestHandleStatus(t *testing.T) {
+	now := time.Date(2026, 1, 15, 10, 30, 0, 0, time.UTC)
+
+	ctrl := &mockController{
+		snapshots: map[string][]model.RunnerSnapshot{
+			"group-a": {
+				{Name: "group-a-1", Group: "group-a", State: "idle", PID: 1234, StartedAt: now},
+				{Name: "group-a-2", Group: "group-a", State: "busy", PID: 5678, StartedAt: now},
+			},
+		},
+	}
+	h := &mockHealth{
+		status: health.HealthStatus{
+			LastCheck: now,
+			Issues:    []model.HealthIssue{},
+		},
+	}
+
+	s := testServer(ctrl, h)
+	srv := httptest.NewServer(s.routes())
+	defer srv.Close()
+
+	resp, err := http.Get(srv.URL + "/status")
+	if err != nil {
+		t.Fatalf("GET /status: %v", err)
+	}
+	defer resp.Body.Close()
+
+	if resp.StatusCode != http.StatusOK {
+		t.Fatalf("expected status 200, got %d", resp.StatusCode)
+	}
+
+	ct := resp.Header.Get("Content-Type")
+	if ct != "application/json" {
+		t.Fatalf("expected Content-Type application/json, got %q", ct)
+	}
+
+	var body statusResponse
+	if err := json.NewDecoder(resp.Body).Decode(&body); err != nil {
+		t.Fatalf("decode response: %v", err)
+	}
+
+	runners, ok := body.Groups["group-a"]
+	if !ok {
+		t.Fatal("expected group-a in response")
+	}
+	if len(runners) != 2 {
+		t.Fatalf("expected 2 runners in group-a, got %d", len(runners))
+	}
+}
+
+func TestHandleHealth(t *testing.T) {
+	now := time.Date(2026, 1, 15, 10, 30, 0, 0, time.UTC)
+
+	ctrl := &mockController{
+		snapshots: map[string][]model.RunnerSnapshot{},
+	}
+	h := &mockHealth{
+		status: health.HealthStatus{
+			LastCheck: now,
+			Issues: []model.HealthIssue{
+				{
+					Level:      model.LevelWarning,
+					Type:       "health.disk_low",
+					Message:    "disk space below threshold",
+					DetectedAt: now,
+				},
+			},
+		},
+	}
+
+	s := testServer(ctrl, h)
+	srv := httptest.NewServer(s.routes())
+	defer srv.Close()
+
+	resp, err := http.Get(srv.URL + "/health")
+	if err != nil {
+		t.Fatalf("GET /health: %v", err)
+	}
+	defer resp.Body.Close()
+
+	if resp.StatusCode != http.StatusOK {
+		t.Fatalf("expected status 200, got %d", resp.StatusCode)
+	}
+
+	var body healthResponse
+	if err := json.NewDecoder(resp.Body).Decode(&body); err != nil {
+		t.Fatalf("decode response: %v", err)
+	}
+
+	if len(body.Issues) != 1 {
+		t.Fatalf("expected 1 issue, got %d", len(body.Issues))
+	}
+	if body.Issues[0].Type != "health.disk_low" {
+		t.Fatalf("expected issue type health.disk_low, got %q", body.Issues[0].Type)
+	}
+}
+
+func TestRoutes_NotFound(t *testing.T) {
+	ctrl := &mockController{
+		snapshots: map[string][]model.RunnerSnapshot{},
+	}
+	h := &mockHealth{
+		status: health.HealthStatus{},
+	}
+
+	s := testServer(ctrl, h)
+	srv := httptest.NewServer(s.routes())
+	defer srv.Close()
+
+	resp, err := http.Get(srv.URL + "/unknown")
+	if err != nil {
+		t.Fatalf("GET /unknown: %v", err)
+	}
+	defer resp.Body.Close()
+
+	if resp.StatusCode != http.StatusNotFound {
+		t.Fatalf("expected status 404, got %d", resp.StatusCode)
+	}
+}
+
+func TestHandleStatus_EmptyGroups(t *testing.T) {
+	ctrl := &mockController{
+		snapshots: map[string][]model.RunnerSnapshot{},
+	}
+	h := &mockHealth{
+		status: health.HealthStatus{
+			Issues: []model.HealthIssue{},
+		},
+	}
+
+	s := testServer(ctrl, h)
+	srv := httptest.NewServer(s.routes())
+	defer srv.Close()
+
+	resp, err := http.Get(srv.URL + "/status")
+	if err != nil {
+		t.Fatalf("GET /status: %v", err)
+	}
+	defer resp.Body.Close()
+
+	if resp.StatusCode != http.StatusOK {
+		t.Fatalf("expected status 200, got %d", resp.StatusCode)
+	}
+
+	var body statusResponse
+	if err := json.NewDecoder(resp.Body).Decode(&body); err != nil {
+		t.Fatalf("decode response: %v", err)
+	}
+
+	if len(body.Groups) != 0 {
+		t.Fatalf("expected 0 groups, got %d", len(body.Groups))
+	}
+}
diff --git a/internal/auth/auth_test.go b/internal/auth/auth_test.go
new file mode 100644
index 0000000..dbdaad8
--- /dev/null
+++ b/internal/auth/auth_test.go
@@ -0,0 +1,591 @@
+package auth
+
+import (
+	"context"
+	"encoding/json"
+	"net/http"
+	"net/http/httptest"
+	"os"
+	"path/filepath"
+	"strings"
+	"testing"
+	"time"
+)
+
+func TestFilePath(t *testing.T) {
+	t.Run("with GHR_CREDENTIALS_FILE env set", func(t *testing.T) {
+		want := "/custom/path/credentials.json"
+		t.Setenv("GHR_CREDENTIALS_FILE", want)
+
+		got := FilePath()
+		if got != want {
+			t.Errorf("FilePath() = %q, want %q", got, want)
+		}
+	})
+
+	t.Run("without env non-root returns home config path", func(t *testing.T) {
+		t.Setenv("GHR_CREDENTIALS_FILE", "")
+
+		got := FilePath()
+
+		// We are running tests as a non-root user, so it should use ~/.config/ghr/credentials.json
+		if os.Getuid() == 0 {
+			t.Skip("test requires non-root user")
+		}
+
+		home, err := os.UserHomeDir()
+		if err != nil {
+			t.Fatalf("UserHomeDir() error: %v", err)
+		}
+		want := filepath.Join(home, ".config", "ghr", "credentials.json")
+		if got != want {
+			t.Errorf("FilePath() = %q, want %q", got, want)
+		}
+	})
+}
+
+func TestLoad_TokenFlag(t *testing.T) {
+	// Point credentials file to a non-existent path to avoid reading real credentials
+	t.Setenv("GHR_CREDENTIALS_FILE", filepath.Join(t.TempDir(), "nonexistent.json"))
+	t.Setenv("GITHUB_TOKEN", "")
+
+	creds, source, err := Load(LoadOpts{TokenFlag: "ghp_flagtoken123"})
+	if err != nil {
+		t.Fatalf("Load() error: %v", err)
+	}
+	if creds.Method != "pat" {
+		t.Errorf("Method = %q, want %q", creds.Method, "pat")
+	}
+	if creds.PAT != "ghp_flagtoken123" {
+		t.Errorf("PAT = %q, want %q", creds.PAT, "ghp_flagtoken123")
+	}
+	if source != "flag (--token)" {
+		t.Errorf("source = %q, want %q", source, "flag (--token)")
+	}
+}
+
+func TestLoad_EnvVar(t *testing.T) {
+	// Point credentials file to a non-existent path
+	t.Setenv("GHR_CREDENTIALS_FILE", filepath.Join(t.TempDir(), "nonexistent.json"))
+	t.Setenv("GITHUB_TOKEN", "ghp_envtoken456")
+
+	creds, source, err := Load(LoadOpts{})
+	if err != nil {
+		t.Fatalf("Load() error: %v", err)
+	}
+	if creds.Method != "pat" {
+		t.Errorf("Method = %q, want %q", creds.Method, "pat")
+	}
+	if creds.PAT != "ghp_envtoken456" {
+		t.Errorf("PAT = %q, want %q", creds.PAT, "ghp_envtoken456")
+	}
+	if source != "env (GITHUB_TOKEN)" {
+		t.Errorf("source = %q, want %q", source, "env (GITHUB_TOKEN)")
+	}
+}
+
+func TestLoad_CredentialsFile(t *testing.T) {
+	dir := t.TempDir()
+	credFile := filepath.Join(dir, "credentials.json")
+	t.Setenv("GHR_CREDENTIALS_FILE", credFile)
+	t.Setenv("GITHUB_TOKEN", "")
+
+	creds := &Credentials{
+		Method:    "pat",
+		GitHubURL: "https://github.com/my-org",
+		PAT:       "ghp_fromfile789",
+		CreatedAt: time.Date(2025, 1, 15, 10, 0, 0, 0, time.UTC),
+	}
+	data, err := json.MarshalIndent(creds, "", "  ")
+	if err != nil {
+		t.Fatalf("MarshalIndent() error: %v", err)
+	}
+	if err := os.WriteFile(credFile, data, 0600); err != nil {
+		t.Fatalf("WriteFile() error: %v", err)
+	}
+
+	loaded, source, err := Load(LoadOpts{})
+	if err != nil {
+		t.Fatalf("Load() error: %v", err)
+	}
+	if loaded.Method != "pat" {
+		t.Errorf("Method = %q, want %q", loaded.Method, "pat")
+	}
+	if loaded.PAT != "ghp_fromfile789" {
+		t.Errorf("PAT = %q, want %q", loaded.PAT, "ghp_fromfile789")
+	}
+	if loaded.GitHubURL != "https://github.com/my-org" {
+		t.Errorf("GitHubURL = %q, want %q", loaded.GitHubURL, "https://github.com/my-org")
+	}
+	if !strings.Contains(source, "file") {
+		t.Errorf("source = %q, want it to contain %q", source, "file")
+	}
+}
+
+func TestLoad_Priority(t *testing.T) {
+	t.Run("TokenFlag wins over GITHUB_TOKEN", func(t *testing.T) {
+		t.Setenv("GHR_CREDENTIALS_FILE", filepath.Join(t.TempDir(), "nonexistent.json"))
+		t.Setenv("GITHUB_TOKEN", "ghp_env_should_lose")
+
+		creds, source, err := Load(LoadOpts{TokenFlag: "ghp_flag_should_win"})
+		if err != nil {
+			t.Fatalf("Load() error: %v", err)
+		}
+		if creds.PAT != "ghp_flag_should_win" {
+			t.Errorf("PAT = %q, want %q", creds.PAT, "ghp_flag_should_win")
+		}
+		if source != "flag (--token)" {
+			t.Errorf("source = %q, want %q", source, "flag (--token)")
+		}
+	})
+
+	t.Run("GITHUB_TOKEN wins over credentials file", func(t *testing.T) {
+		dir := t.TempDir()
+		credFile := filepath.Join(dir, "credentials.json")
+		t.Setenv("GHR_CREDENTIALS_FILE", credFile)
+		t.Setenv("GITHUB_TOKEN", "ghp_env_should_win")
+
+		fileCreds := &Credentials{
+			Method:    "pat",
+			PAT:       "ghp_file_should_lose",
+			CreatedAt: time.Now(),
+		}
+		data, err := json.MarshalIndent(fileCreds, "", "  ")
+		if err != nil {
+			t.Fatalf("MarshalIndent() error: %v", err)
+		}
+		if err := os.WriteFile(credFile, data, 0600); err != nil {
+			t.Fatalf("WriteFile() error: %v", err)
+		}
+
+		creds, source, err := Load(LoadOpts{})
+		if err != nil {
+			t.Fatalf("Load() error: %v", err)
+		}
+		if creds.PAT != "ghp_env_should_win" {
+			t.Errorf("PAT = %q, want %q", creds.PAT, "ghp_env_should_win")
+		}
+		if source != "env (GITHUB_TOKEN)" {
+			t.Errorf("source = %q, want %q", source, "env (GITHUB_TOKEN)")
+		}
+	})
+}
+
+func TestLoad_NotAuthenticated(t *testing.T) {
+	t.Setenv("GHR_CREDENTIALS_FILE", filepath.Join(t.TempDir(), "nonexistent.json"))
+	t.Setenv("GITHUB_TOKEN", "")
+
+	_, _, err := Load(LoadOpts{})
+	if err == nil {
+		t.Fatal("Load() expected error, got nil")
+	}
+	if !strings.Contains(err.Error(), "not authenticated") {
+		t.Errorf("error = %q, want it to contain %q", err.Error(), "not authenticated")
+	}
+}
+
+func TestSave_And_Load(t *testing.T) {
+	dir := t.TempDir()
+	credFile := filepath.Join(dir, "credentials.json")
+	t.Setenv("GHR_CREDENTIALS_FILE", credFile)
+	t.Setenv("GITHUB_TOKEN", "")
+
+	original := &Credentials{
+		Method:    "pat",
+		GitHubURL: "https://github.com/test-org",
+		PAT:       "ghp_saveandload123",
+		CreatedAt: time.Date(2025, 6, 1, 12, 0, 0, 0, time.UTC),
+	}
+
+	if err := Save(original); err != nil {
+		t.Fatalf("Save() error: %v", err)
+	}
+
+	// Verify file permissions are 0600
+	info, err := os.Stat(credFile)
+	if err != nil {
+		t.Fatalf("Stat() error: %v", err)
+	}
+	perm := info.Mode().Perm()
+	if perm != 0600 {
+		t.Errorf("file permissions = %o, want %o", perm, 0600)
+	}
+
+	// Load back and verify
+	loaded, source, err := Load(LoadOpts{})
+	if err != nil {
+		t.Fatalf("Load() error: %v", err)
+	}
+	if loaded.Method != original.Method {
+		t.Errorf("Method = %q, want %q", loaded.Method, original.Method)
+	}
+	if loaded.PAT != original.PAT {
+		t.Errorf("PAT = %q, want %q", loaded.PAT, original.PAT)
+	}
+	if loaded.GitHubURL != original.GitHubURL {
+		t.Errorf("GitHubURL = %q, want %q", loaded.GitHubURL, original.GitHubURL)
+	}
+	if !strings.Contains(source, "file") {
+		t.Errorf("source = %q, want it to contain %q", source, "file")
+	}
+}
+
+func TestSave_CreatesDirectory(t *testing.T) {
+	dir := t.TempDir()
+	nestedPath := filepath.Join(dir, "nested", "deep", "credentials.json")
+	t.Setenv("GHR_CREDENTIALS_FILE", nestedPath)
+
+	creds := &Credentials{
+		Method:    "pat",
+		PAT:       "ghp_nested123",
+		CreatedAt: time.Now(),
+	}
+
+	if err := Save(creds); err != nil {
+		t.Fatalf("Save() error: %v", err)
+	}
+
+	// Verify parent directory was created with 0700
+	parentDir := filepath.Dir(nestedPath)
+	info, err := os.Stat(parentDir)
+	if err != nil {
+		t.Fatalf("Stat(%s) error: %v", parentDir, err)
+	}
+	if !info.IsDir() {
+		t.Errorf("%s is not a directory", parentDir)
+	}
+	perm := info.Mode().Perm()
+	if perm != 0700 {
+		t.Errorf("directory permissions = %o, want %o", perm, 0700)
+	}
+}
+
+func TestSave_SetsCreatedAt(t *testing.T) {
+	dir := t.TempDir()
+	credFile := filepath.Join(dir, "credentials.json")
+	t.Setenv("GHR_CREDENTIALS_FILE", credFile)
+
+	before := time.Now().Add(-time.Second)
+
+	creds := &Credentials{
+		Method: "pat",
+		PAT:    "ghp_timestamp123",
+		// CreatedAt is zero
+	}
+
+	if err := Save(creds); err != nil {
+		t.Fatalf("Save() error: %v", err)
+	}
+
+	after := time.Now().Add(time.Second)
+
+	if creds.CreatedAt.IsZero() {
+		t.Fatal("CreatedAt should not be zero after Save()")
+	}
+	if creds.CreatedAt.Before(before) {
+		t.Errorf("CreatedAt = %v, want after %v", creds.CreatedAt, before)
+	}
+	if creds.CreatedAt.After(after) {
+		t.Errorf("CreatedAt = %v, want before %v", creds.CreatedAt, after)
+	}
+}
+
+func TestRemove(t *testing.T) {
+	t.Run("save then remove", func(t *testing.T) {
+		dir := t.TempDir()
+		credFile := filepath.Join(dir, "credentials.json")
+		t.Setenv("GHR_CREDENTIALS_FILE", credFile)
+
+		creds := &Credentials{
+			Method:    "pat",
+			PAT:       "ghp_removeme",
+			CreatedAt: time.Now(),
+		}
+		if err := Save(creds); err != nil {
+			t.Fatalf("Save() error: %v", err)
+		}
+
+		// Verify file exists
+		if _, err := os.Stat(credFile); err != nil {
+			t.Fatalf("file should exist before Remove(), Stat error: %v", err)
+		}
+
+		if err := Remove(); err != nil {
+			t.Fatalf("Remove() error: %v", err)
+		}
+
+		// Verify file no longer exists
+		if _, err := os.Stat(credFile); !os.IsNotExist(err) {
+			t.Errorf("file should not exist after Remove(), Stat error: %v", err)
+		}
+	})
+
+	t.Run("remove when file does not exist", func(t *testing.T) {
+		t.Setenv("GHR_CREDENTIALS_FILE", filepath.Join(t.TempDir(), "nonexistent.json"))
+
+		if err := Remove(); err != nil {
+			t.Errorf("Remove() on non-existent file should not error, got: %v", err)
+		}
+	})
+}
+
+func TestMaskedPAT(t *testing.T) {
+	tests := []struct {
+		name string
+		pat  string
+		want string
+	}{
+		{
+			name: "standard PAT",
+			pat:  "ghp_1234567890abcdef",
+			want: "ghp_...cdef",
+		},
+		{
+			name: "short token",
+			pat:  "short",
+			want: "****",
+		},
+		{
+			name: "empty token",
+			pat:  "",
+			want: "****",
+		},
+		{
+			name: "exactly 12 chars",
+			pat:  "exactlytwelv",
+			want: "exac...welv",
+		},
+		{
+			name: "11 chars returns mask",
+			pat:  "elevenchar!",
+			want: "****",
+		},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			got := MaskedPAT(tt.pat)
+			if got != tt.want {
+				t.Errorf("MaskedPAT(%q) = %q, want %q", tt.pat, got, tt.want)
+			}
+		})
+	}
+}
+
+func TestValidate_PAT(t *testing.T) {
+	// validatePAT hardcodes "https://api.github.com/user", so we cannot inject
+	// a test server URL without modifying the production code. Instead, we test
+	// validatePAT indirectly via Validate for the success case using httptest
+	// by temporarily overriding http.DefaultTransport.
+	t.Run("valid PAT", func(t *testing.T) {
+		srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+			// Verify the request is well-formed
+			if got := r.Header.Get("Authorization"); got != "Bearer ghp_testtoken" {
+				t.Errorf("Authorization = %q, want %q", got, "Bearer ghp_testtoken")
+			}
+			if got := r.Header.Get("Accept"); got != "application/vnd.github+json" {
+				t.Errorf("Accept = %q, want %q", got, "application/vnd.github+json")
+			}
+			w.Header().Set("X-OAuth-Scopes", "admin:org, repo")
+			w.WriteHeader(http.StatusOK)
+			resp := githubUserResponse{Login: "testuser"}
+			if err := json.NewEncoder(w).Encode(resp); err != nil {
+				t.Errorf("encode response: %v", err)
+			}
+		}))
+		defer srv.Close()
+
+		// Override DefaultTransport to redirect api.github.com to the test server
+		origTransport := http.DefaultTransport
+		http.DefaultTransport = &rewriteTransport{
+			targetURL: srv.URL,
+			wrapped:   origTransport,
+		}
+		defer func() { http.DefaultTransport = origTransport }()
+
+		result, err := Validate(context.Background(), &Credentials{
+			Method: "pat",
+			PAT:    "ghp_testtoken",
+		})
+		if err != nil {
+			t.Fatalf("Validate() error: %v", err)
+		}
+		if !result.Valid {
+			t.Error("Valid = false, want true")
+		}
+		if result.Username != "testuser" {
+			t.Errorf("Username = %q, want %q", result.Username, "testuser")
+		}
+		if len(result.Scopes) != 2 {
+			t.Errorf("Scopes length = %d, want 2", len(result.Scopes))
+		} else {
+			if result.Scopes[0] != "admin:org" {
+				t.Errorf("Scopes[0] = %q, want %q", result.Scopes[0], "admin:org")
+			}
+			if result.Scopes[1] != "repo" {
+				t.Errorf("Scopes[1] = %q, want %q", result.Scopes[1], "repo")
+			}
+		}
+	})
+
+	t.Run("unauthorized PAT", func(t *testing.T) {
+		srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
+			w.WriteHeader(http.StatusUnauthorized)
+			_, writeErr := w.Write([]byte(`{"message":"Bad credentials"}`))
+			if writeErr != nil {
+				t.Errorf("write response: %v", writeErr)
+			}
+		}))
+		defer srv.Close()
+
+		origTransport := http.DefaultTransport
+		http.DefaultTransport = &rewriteTransport{
+			targetURL: srv.URL,
+			wrapped:   origTransport,
+		}
+		defer func() { http.DefaultTransport = origTransport }()
+
+		_, err := Validate(context.Background(), &Credentials{
+			Method: "pat",
+			PAT:    "ghp_badtoken",
+		})
+		if err == nil {
+			t.Fatal("Validate() expected error for unauthorized PAT, got nil")
+		}
+		if !strings.Contains(err.Error(), "401") {
+			t.Errorf("error = %q, want it to contain %q", err.Error(), "401")
+		}
+	})
+}
+
+func TestValidate_GitHubApp(t *testing.T) {
+	t.Run("valid private key file", func(t *testing.T) {
+		dir := t.TempDir()
+		keyPath := filepath.Join(dir, "test.pem")
+		if err := os.WriteFile(keyPath, []byte("fake-pem-content"), 0600); err != nil {
+			t.Fatalf("WriteFile() error: %v", err)
+		}
+
+		result, err := Validate(context.Background(), &Credentials{
+			Method: "github_app",
+			GitHubApp: &GitHubAppCreds{
+				ClientID:       "Iv1.abc123",
+				InstallationID: 12345678,
+				PrivateKeyPath: keyPath,
+			},
+		})
+		if err != nil {
+			t.Fatalf("Validate() error: %v", err)
+		}
+		if !result.Valid {
+			t.Error("Valid = false, want true")
+		}
+	})
+
+	t.Run("non-existent private key file", func(t *testing.T) {
+		_, err := Validate(context.Background(), &Credentials{
+			Method: "github_app",
+			GitHubApp: &GitHubAppCreds{
+				ClientID:       "Iv1.abc123",
+				InstallationID: 12345678,
+				PrivateKeyPath: "/nonexistent/path/key.pem",
+			},
+		})
+		if err == nil {
+			t.Fatal("Validate() expected error for non-existent key, got nil")
+		}
+		if !strings.Contains(err.Error(), "open private key") {
+			t.Errorf("error = %q, want it to contain %q", err.Error(), "open private key")
+		}
+	})
+
+	t.Run("nil GitHubAppCreds", func(t *testing.T) {
+		_, err := Validate(context.Background(), &Credentials{
+			Method:    "github_app",
+			GitHubApp: nil,
+		})
+		if err == nil {
+			t.Fatal("Validate() expected error for nil creds, got nil")
+		}
+		if !strings.Contains(err.Error(), "credentials are nil") {
+			t.Errorf("error = %q, want it to contain %q", err.Error(), "credentials are nil")
+		}
+	})
+}
+
+func TestValidate_UnknownMethod(t *testing.T) {
+	_, err := Validate(context.Background(), &Credentials{
+		Method: "unknown",
+	})
+	if err == nil {
+		t.Fatal("Validate() expected error for unknown method, got nil")
+	}
+	if !strings.Contains(err.Error(), "unknown method") {
+		t.Errorf("error = %q, want it to contain %q", err.Error(), "unknown method")
+	}
+}
+
+func TestParseScopes(t *testing.T) {
+	tests := []struct {
+		name   string
+		header string
+		want   []string
+	}{
+		{
+			name:   "multiple scopes",
+			header: "admin:org, repo, workflow",
+			want:   []string{"admin:org", "repo", "workflow"},
+		},
+		{
+			name:   "single scope",
+			header: "repo",
+			want:   []string{"repo"},
+		},
+		{
+			name:   "empty header",
+			header: "",
+			want:   nil,
+		},
+		{
+			name:   "extra whitespace",
+			header: " admin:org , repo ,  workflow ",
+			want:   []string{"admin:org", "repo", "workflow"},
+		},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			got := parseScopes(tt.header)
+			if len(got) != len(tt.want) {
+				t.Fatalf("parseScopes(%q) length = %d, want %d", tt.header, len(got), len(tt.want))
+			}
+			for i := range got {
+				if got[i] != tt.want[i] {
+					t.Errorf("parseScopes(%q)[%d] = %q, want %q", tt.header, i, got[i], tt.want[i])
+				}
+			}
+		})
+	}
+}
+
+// rewriteTransport is an http.RoundTripper that redirects requests targeting
+// api.github.com to a local httptest server. This allows testing validatePAT
+// without modifying the production code.
+type rewriteTransport struct {
+	targetURL string
+	wrapped   http.RoundTripper
+}
+
+func (t *rewriteTransport) RoundTrip(req *http.Request) (*http.Response, error) {
+	if req.URL.Host == "api.github.com" {
+		req = req.Clone(req.Context())
+		req.URL.Scheme = "http"
+		parsed, err := http.NewRequest(req.Method, t.targetURL+req.URL.Path, req.Body)
+		if err != nil {
+			return nil, err
+		}
+		parsed.Header = req.Header
+		req = parsed
+	}
+	return t.wrapped.RoundTrip(req)
+}
diff --git a/internal/auth/credentials.go b/internal/auth/credentials.go
new file mode 100644
index 0000000..fc5997f
--- /dev/null
+++ b/internal/auth/credentials.go
@@ -0,0 +1,32 @@
+package auth
+
+import "time"
+
+type Credentials struct {
+	Method    string          `json:"method"`
+	GitHubURL string          `json:"github_url"`
+	PAT       string          `json:"pat,omitempty"`
+	GitHubApp *GitHubAppCreds `json:"github_app,omitempty"`
+	CreatedAt time.Time       `json:"created_at"`
+}
+
+type GitHubAppCreds struct {
+	ClientID       string `json:"client_id"`
+	InstallationID int64  `json:"installation_id"`
+	PrivateKeyPath string `json:"private_key_path"`
+}
+
+type LoadOpts struct {
+	TokenFlag string
+}
+
+type ValidationResult struct {
+	Valid    bool
+	Username string
+	Scopes   []string
+	OrgName  string
+}
+
+type githubUserResponse struct {
+	Login string `json:"login"`
+}
diff --git a/internal/auth/load.go b/internal/auth/load.go
new file mode 100644
index 0000000..7ee8250
--- /dev/null
+++ b/internal/auth/load.go
@@ -0,0 +1,39 @@
+package auth
+
+import (
+	"fmt"
+	"os"
+)
+
+func Load(opts LoadOpts) (*Credentials, string, error) {
+	if opts.TokenFlag != "" {
+		return &Credentials{
+			Method: "pat",
+			PAT:    opts.TokenFlag,
+		}, "flag (--token)", nil
+	}
+
+	if token := os.Getenv("GITHUB_TOKEN"); token != "" {
+		return &Credentials{
+			Method: "pat",
+			PAT:    token,
+		}, "env (GITHUB_TOKEN)", nil
+	}
+
+	creds, err := loadFromFile()
+	if err == nil {
+		return creds, fmt.Sprintf("file (%s)", FilePath()), nil
+	}
+	if !os.IsNotExist(err) {
+		return nil, "", fmt.Errorf("load credentials file: %w", err)
+	}
+
+	if token := os.Getenv("GITHUB_TOKEN"); token != "" {
+		return &Credentials{
+			Method: "pat",
+			PAT:    token,
+		}, "env (.env GITHUB_TOKEN)", nil
+	}
+
+	return nil, "", fmt.Errorf("not authenticated. Run 'ghr login' to set up authentication, or set GITHUB_TOKEN")
+}
diff --git a/internal/auth/store.go b/internal/auth/store.go
new file mode 100644
index 0000000..639fa3d
--- /dev/null
+++ b/internal/auth/store.go
@@ -0,0 +1,65 @@
+package auth
+
+import (
+	"encoding/json"
+	"fmt"
+	"os"
+	"path/filepath"
+	"time"
+)
+
+func FilePath() string {
+	if p := os.Getenv("GHR_CREDENTIALS_FILE"); p != "" {
+		return p
+	}
+	if os.Getuid() == 0 {
+		return "/etc/ghr/credentials.json"
+	}
+	home, err := os.UserHomeDir()
+	if err != nil {
+		return filepath.Join(".config", "ghr", "credentials.json")
+	}
+	return filepath.Join(home, ".config", "ghr", "credentials.json")
+}
+
+func loadFromFile() (*Credentials, error) {
+	data, err := os.ReadFile(FilePath())
+	if err != nil {
+		return nil, err
+	}
+	var creds Credentials
+	if err := json.Unmarshal(data, &creds); err != nil {
+		return nil, fmt.Errorf("parse credentials file: %w", err)
+	}
+	return &creds, nil
+}
+
+func Save(creds *Credentials) error {
+	if creds.CreatedAt.IsZero() {
+		creds.CreatedAt = time.Now()
+	}
+
+	p := FilePath()
+	dir := filepath.Dir(p)
+	if err := os.MkdirAll(dir, 0o700); err != nil {
+		return fmt.Errorf("create credentials directory %s: %w", dir, err)
+	}
+
+	data, err := json.MarshalIndent(creds, "", "  ")
+	if err != nil {
+		return fmt.Errorf("marshal credentials: %w", err)
+	}
+
+	if err := os.WriteFile(p, data, 0o600); err != nil {
+		return fmt.Errorf("write credentials file %s: %w", p, err)
+	}
+	return nil
+}
+
+func Remove() error {
+	err := os.Remove(FilePath())
+	if err != nil && !os.IsNotExist(err) {
+		return fmt.Errorf("remove credentials file: %w", err)
+	}
+	return nil
+}
diff --git a/internal/auth/validate.go b/internal/auth/validate.go
new file mode 100644
index 0000000..89a2f87
--- /dev/null
+++ b/internal/auth/validate.go
@@ -0,0 +1,101 @@
+package auth
+
+import (
+	"context"
+	"encoding/json"
+	"fmt"
+	"io"
+	"net/http"
+	"os"
+	"strings"
+)
+
+func Validate(ctx context.Context, creds *Credentials) (*ValidationResult, error) {
+	switch creds.Method {
+	case "pat":
+		return validatePAT(ctx, creds.PAT)
+	case "github_app":
+		return validateGitHubApp(creds.GitHubApp)
+	default:
+		return nil, fmt.Errorf("validate credentials: unknown method %q", creds.Method)
+	}
+}
+
+func validatePAT(ctx context.Context, pat string) (*ValidationResult, error) {
+	req, err := http.NewRequestWithContext(ctx, http.MethodGet, "https://api.github.com/user", http.NoBody)
+	if err != nil {
+		return nil, fmt.Errorf("validate PAT: create request: %w", err)
+	}
+	req.Header.Set("Authorization", "Bearer "+pat)
+	req.Header.Set("Accept", "application/vnd.github+json")
+
+	resp, err := http.DefaultClient.Do(req)
+	if err != nil {
+		return nil, fmt.Errorf("validate PAT: request failed: %w", err)
+	}
+	defer func() {
+		_, _ = io.Copy(io.Discard, resp.Body)
+		resp.Body.Close()
+	}()
+
+	body, err := io.ReadAll(resp.Body)
+	if err != nil {
+		return nil, fmt.Errorf("validate PAT: read response: %w", err)
+	}
+
+	if resp.StatusCode != http.StatusOK {
+		return nil, fmt.Errorf("validate PAT: GitHub API returned %d: %s", resp.StatusCode, string(body))
+	}
+
+	var user githubUserResponse
+	if err := json.Unmarshal(body, &user); err != nil {
+		return nil, fmt.Errorf("validate PAT: parse response: %w", err)
+	}
+
+	scopes := parseScopes(resp.Header.Get("X-OAuth-Scopes"))
+
+	return &ValidationResult{
+		Valid:    true,
+		Username: user.Login,
+		Scopes:   scopes,
+	}, nil
+}
+
+func parseScopes(header string) []string {
+	if header == "" {
+		return nil
+	}
+	parts := strings.Split(header, ",")
+	scopes := make([]string, 0, len(parts))
+	for _, p := range parts {
+		s := strings.TrimSpace(p)
+		if s != "" {
+			scopes = append(scopes, s)
+		}
+	}
+	return scopes
+}
+
+func validateGitHubApp(app *GitHubAppCreds) (*ValidationResult, error) {
+	if app == nil {
+		return nil, fmt.Errorf("validate GitHub App: credentials are nil")
+	}
+	f, err := os.Open(app.PrivateKeyPath)
+	if err != nil {
+		return nil, fmt.Errorf("validate GitHub App: open private key %s: %w", app.PrivateKeyPath, err)
+	}
+	if err := f.Close(); err != nil {
+		return nil, fmt.Errorf("validate GitHub App: close private key file: %w", err)
+	}
+
+	return &ValidationResult{
+		Valid: true,
+	}, nil
+}
+
+func MaskedPAT(pat string) string {
+	if len(pat) < 12 {
+		return "****"
+	}
+	return pat[:4] + "..." + pat[len(pat)-4:]
+}
diff --git a/internal/cli/auth.go b/internal/cli/auth.go
new file mode 100644
index 0000000..a41e1ff
--- /dev/null
+++ b/internal/cli/auth.go
@@ -0,0 +1,61 @@
+package cli
+
+import (
+	"fmt"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/auth"
+	"github.com/spf13/cobra"
+)
+
+func newAuthCmd() *cobra.Command {
+	cmd := &cobra.Command{
+		Use:   "auth",
+		Short: "Authentication management",
+	}
+
+	cmd.AddCommand(newAuthStatusCmd())
+	return cmd
+}
+
+func newAuthStatusCmd() *cobra.Command {
+	return &cobra.Command{
+		Use:   "status",
+		Short: "Display current authentication state",
+		RunE: func(cmd *cobra.Command, _ []string) error {
+			creds, source, loadErr := auth.Load(auth.LoadOpts{TokenFlag: tokenFlag})
+			if loadErr != nil {
+				fmt.Println("Status:  not authenticated")
+				fmt.Println("Run 'ghr login' to authenticate.")
+				return nil
+			}
+
+			fmt.Printf("Method:  %s\n", creds.Method)
+			fmt.Printf("Source:  %s\n", source)
+			if creds.GitHubURL != "" {
+				fmt.Printf("GitHub:  %s\n", creds.GitHubURL)
+			}
+			if creds.Method == "pat" && creds.PAT != "" {
+				fmt.Printf("Token:   %s\n", auth.MaskedPAT(creds.PAT))
+			}
+			if creds.GitHubApp != nil {
+				fmt.Printf("Client:  %s\n", creds.GitHubApp.ClientID)
+				fmt.Printf("Install: %d\n", creds.GitHubApp.InstallationID)
+				fmt.Printf("Key:     %s\n", creds.GitHubApp.PrivateKeyPath)
+			}
+
+			result, err := auth.Validate(cmd.Context(), creds)
+			if err != nil {
+				fmt.Printf("Status:  validation failed: %v\n", err)
+				return nil
+			}
+			if result.Valid {
+				fmt.Println("Status:  authenticated")
+				if result.Username != "" {
+					fmt.Printf("User:    @%s\n", result.Username)
+				}
+			}
+
+			return nil
+		},
+	}
+}
diff --git a/internal/cli/daemon.go b/internal/cli/daemon.go
new file mode 100644
index 0000000..3170ee5
--- /dev/null
+++ b/internal/cli/daemon.go
@@ -0,0 +1,188 @@
+package cli
+
+import (
+	"context"
+	"fmt"
+	"log/slog"
+	"os"
+	"path/filepath"
+	"strconv"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/api"
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/auth"
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/config"
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/controller"
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/github"
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/health"
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/logging"
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/monitoring"
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/notification"
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/runner"
+)
+
+type daemon struct {
+	ctrl   *controller.GroupController
+	health *health.Monitor
+	api    *api.Server
+	logMgr *logging.LogManager
+	cfg    *config.Config
+	logger *slog.Logger
+}
+
+func buildDaemon(cfg *config.Config, creds *auth.Credentials, githubURL string) (*daemon, error) {
+	logMgr, err := logging.New(logging.LogConfig{
+		Level:         cfg.Logging.Level,
+		Format:        cfg.Logging.Format,
+		Dir:           cfg.Logging.Dir,
+		RetentionDays: cfg.Logging.RetentionDays,
+		RunnerOutput:  cfg.Logging.RunnerOutput != nil && *cfg.Logging.RunnerOutput,
+	})
+	if err != nil {
+		return nil, fmt.Errorf("setup logging: %w", err)
+	}
+
+	logger, err := logMgr.DaemonLogger()
+	if err != nil {
+		logMgr.Close()
+		return nil, fmt.Errorf("create daemon logger: %w", err)
+	}
+
+	if err := logMgr.CleanupOldLogs(); err != nil {
+		logger.Warn("log cleanup failed", "error", err)
+	}
+
+	ghClient, err := github.NewClient(creds, githubURL)
+	if err != nil {
+		logMgr.Close()
+		return nil, fmt.Errorf("create github client: %w", err)
+	}
+
+	binaryMgr := runner.NewBinaryManager(cfg.Runner.CacheDir, logger)
+	processMgr := runner.NewProcessManager(cfg.Runner.WorkdirBase, logger)
+
+	if err := processMgr.CleanupStale(context.Background()); err != nil {
+		logger.Warn("stale runner cleanup failed", "error", err)
+	}
+	processMgr.KillOrphanRunners(context.Background())
+
+	notifSvc := buildNotificationService(cfg, logger)
+	reporters := buildReporters(cfg, logger)
+
+	ctrl := controller.New(
+		ghClient, binaryMgr, processMgr, notifSvc, logMgr,
+		cfg.Groups, controller.ControllerConfig{
+			RunnerVersion: cfg.Runner.Version,
+			RunnerGroupID: 1,
+		}, logger,
+	)
+
+	var minDiskSpace int64
+	if cfg.Health.MinDiskSpace != "" {
+		minDiskSpace, _ = config.ParseByteSize(cfg.Health.MinDiskSpace)
+	}
+
+	healthMon := health.NewMonitor(health.MonitorConfig{
+		Enabled:                cfg.Health.Enabled,
+		CheckInterval:          cfg.Health.CheckInterval.Duration,
+		RunnerTimeout:          cfg.Health.RunnerTimeout.Duration,
+		IdleTimeout:            cfg.Health.IdleTimeout.Duration,
+		DivergenceTimeout:      cfg.Health.DivergenceTimeout.Duration,
+		MaxConsecutiveFailures: cfg.Health.MaxConsecutiveFailures,
+		FailureCooldown:        cfg.Health.FailureCooldown.Duration,
+		MinDiskSpace:           minDiskSpace,
+		GroupMinRunners:        buildGroupMinRunners(cfg),
+	}, notifSvc, ctrl, reporters, ctrl, logger)
+
+	apiServer := api.NewServer(cfg.Daemon.StateDir, ctrl, healthMon, logger)
+
+	return &daemon{
+		ctrl:   ctrl,
+		health: healthMon,
+		api:    apiServer,
+		logMgr: logMgr,
+		cfg:    cfg,
+		logger: logger,
+	}, nil
+}
+
+func buildNotificationService(cfg *config.Config, logger *slog.Logger) *notification.Service {
+	var providers []notification.Provider
+	filters := make(map[string]notification.EventFilter)
+
+	if cfg.Notifications.Discord.Enabled && cfg.Notifications.Discord.WebhookURL != "" {
+		providers = append(providers, notification.NewDiscord(&notification.DiscordConfig{
+			WebhookURL: cfg.Notifications.Discord.WebhookURL,
+			Username:   cfg.Notifications.Discord.Username,
+			AvatarURL:  cfg.Notifications.Discord.AvatarURL,
+			Mentions: notification.DiscordMentions{
+				Error:    cfg.Notifications.Discord.Mentions.Error,
+				Critical: cfg.Notifications.Discord.Mentions.Critical,
+			},
+		}))
+		filters["discord"] = notification.EventFilter{
+			Patterns: cfg.Notifications.Discord.Events,
+		}
+	}
+
+	return notification.New(providers, filters, logger)
+}
+
+func buildReporters(cfg *config.Config, logger *slog.Logger) []health.Reporter {
+	var reporters []health.Reporter
+
+	logger.Debug("uptime-kuma config",
+		"enabled", cfg.Monitoring.UptimeKuma.Enabled,
+		"base_url_set", cfg.Monitoring.UptimeKuma.BaseURL != "",
+		"daemon_token_set", cfg.Monitoring.UptimeKuma.DaemonToken != "",
+		"group_tokens", len(cfg.Monitoring.UptimeKuma.GroupTokens),
+	)
+	if cfg.Monitoring.UptimeKuma.Enabled && cfg.Monitoring.UptimeKuma.BaseURL != "" {
+		reporters = append(reporters, monitoring.NewUptimeKuma(monitoring.UptimeKumaConfig{
+			BaseURL:            cfg.Monitoring.UptimeKuma.BaseURL,
+			DaemonToken:        cfg.Monitoring.UptimeKuma.DaemonToken,
+			GroupTokens:        cfg.Monitoring.UptimeKuma.GroupTokens,
+			DegradedThreshold:  cfg.Monitoring.UptimeKuma.DegradedThreshold,
+			ReportHealthAsPing: cfg.Monitoring.UptimeKuma.ReportHealthAsPing,
+		}, logger))
+	}
+
+	return reporters
+}
+
+func resolveGitHubURL(creds *auth.Credentials, cfg *config.Config) (string, error) {
+	if creds.GitHubURL != "" {
+		return creds.GitHubURL, nil
+	}
+	if cfg.GitHub.URL != "" {
+		return cfg.GitHub.URL, nil
+	}
+	return "", fmt.Errorf("github URL not configured: set it via 'ghr login' or in config github.url")
+}
+
+func pidFilePath(stateDir string) string {
+	return filepath.Join(stateDir, "daemon.pid")
+}
+
+func writePIDFile(path string) error {
+	dir := filepath.Dir(path)
+	if err := os.MkdirAll(dir, 0o755); err != nil {
+		return fmt.Errorf("create pid file directory %s: %w", dir, err)
+	}
+	pid := strconv.Itoa(os.Getpid())
+	if err := os.WriteFile(path, []byte(pid), 0o644); err != nil {
+		return fmt.Errorf("write pid file %s: %w", path, err)
+	}
+	return nil
+}
+
+func removePIDFile(path string) {
+	_ = os.Remove(path)
+}
+
+func buildGroupMinRunners(cfg *config.Config) map[string]int {
+	m := make(map[string]int, len(cfg.Groups))
+	for _, g := range cfg.Groups {
+		m[g.Name] = g.MinRunners
+	}
+	return m
+}
diff --git a/internal/cli/login.go b/internal/cli/login.go
new file mode 100644
index 0000000..7de8644
--- /dev/null
+++ b/internal/cli/login.go
@@ -0,0 +1,123 @@
+package cli
+
+import (
+	"bufio"
+	"fmt"
+	"os"
+	"strings"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/auth"
+	"github.com/spf13/cobra"
+)
+
+func newLoginCmd() *cobra.Command {
+	cmd := &cobra.Command{
+		Use:   "login",
+		Short: "Authenticate with GitHub",
+		Long:  "Interactive wizard to configure GitHub authentication. Supports PAT and GitHub App.",
+		RunE:  runLogin,
+	}
+
+	cmd.Flags().String("method", "", "auth method: pat or app")
+	cmd.Flags().String("url", "", "GitHub URL (org, repo, or enterprise)")
+	cmd.Flags().String("client-id", "", "GitHub App client ID")
+	cmd.Flags().Int64("installation-id", 0, "GitHub App installation ID")
+	cmd.Flags().String("private-key", "", "path to GitHub App private key (.pem)")
+
+	return cmd
+}
+
+func runLogin(cmd *cobra.Command, _ []string) error {
+	method, err := cmd.Flags().GetString("method")
+	if err != nil {
+		return fmt.Errorf("get method flag: %w", err)
+	}
+
+	if method == "" {
+		reader := bufio.NewReader(os.Stdin)
+		return interactiveLogin(cmd, reader)
+	}
+
+	return nonInteractiveLogin(cmd, method)
+}
+
+func nonInteractiveLogin(cmd *cobra.Command, method string) error {
+	url, err := cmd.Flags().GetString("url")
+	if err != nil {
+		return fmt.Errorf("get url flag: %w", err)
+	}
+
+	var creds *auth.Credentials
+
+	switch method {
+	case "pat":
+		if tokenFlag == "" {
+			return fmt.Errorf("--token is required for PAT authentication")
+		}
+		if url == "" {
+			return fmt.Errorf("--url is required")
+		}
+		creds = &auth.Credentials{
+			Method:    "pat",
+			GitHubURL: url,
+			PAT:       tokenFlag,
+		}
+
+	case "app":
+		clientID, flagErr := cmd.Flags().GetString("client-id")
+		if flagErr != nil {
+			return fmt.Errorf("get client-id flag: %w", flagErr)
+		}
+		installationID, flagErr := cmd.Flags().GetInt64("installation-id")
+		if flagErr != nil {
+			return fmt.Errorf("get installation-id flag: %w", flagErr)
+		}
+		privateKey, flagErr := cmd.Flags().GetString("private-key")
+		if flagErr != nil {
+			return fmt.Errorf("get private-key flag: %w", flagErr)
+		}
+		if clientID == "" || installationID == 0 || privateKey == "" || url == "" {
+			return fmt.Errorf("--client-id, --installation-id, --private-key, and --url are all required for GitHub App authentication")
+		}
+		creds = &auth.Credentials{
+			Method:    "github_app",
+			GitHubURL: url,
+			GitHubApp: &auth.GitHubAppCreds{
+				ClientID:       clientID,
+				InstallationID: installationID,
+				PrivateKeyPath: privateKey,
+			},
+		}
+
+	default:
+		return fmt.Errorf("unknown method %q: must be 'pat' or 'app'", method)
+	}
+
+	return validateAndSave(cmd, creds)
+}
+
+func validateAndSave(cmd *cobra.Command, creds *auth.Credentials) error {
+	fmt.Println("  Validating...")
+	result, err := auth.Validate(cmd.Context(), creds)
+	if err != nil {
+		return fmt.Errorf("validation failed: %w", err)
+	}
+
+	if !result.Valid {
+		return fmt.Errorf("credentials are not valid")
+	}
+
+	if err := auth.Save(creds); err != nil {
+		return fmt.Errorf("save credentials: %w", err)
+	}
+
+	if creds.Method == "pat" && result.Username != "" {
+		fmt.Printf("✓ Authenticated as @%s\n", result.Username)
+	}
+	if creds.Method == "pat" && len(result.Scopes) > 0 {
+		fmt.Printf("✓ Scopes: %s\n", strings.Join(result.Scopes, ", "))
+	}
+	fmt.Printf("✓ Credentials saved to %s\n", auth.FilePath())
+
+	return nil
+}
diff --git a/internal/cli/login_wizard.go b/internal/cli/login_wizard.go
new file mode 100644
index 0000000..27ab4ac
--- /dev/null
+++ b/internal/cli/login_wizard.go
@@ -0,0 +1,118 @@
+package cli
+
+import (
+	"bufio"
+	"fmt"
+	"strconv"
+	"strings"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/auth"
+	"github.com/spf13/cobra"
+)
+
+func interactiveLogin(cmd *cobra.Command, reader *bufio.Reader) error {
+	fmt.Println()
+	fmt.Println("? Authentication method")
+	fmt.Println("  1) Personal Access Token (PAT)")
+	fmt.Println("  2) GitHub App")
+	fmt.Print("> ")
+
+	choice, err := reader.ReadString('\n')
+	if err != nil {
+		return fmt.Errorf("read choice: %w", err)
+	}
+	choice = strings.TrimSpace(choice)
+
+	switch choice {
+	case "1":
+		return interactivePAT(cmd, reader)
+	case "2":
+		return interactiveApp(cmd, reader)
+	default:
+		return fmt.Errorf("invalid choice: %q (expected 1 or 2)", choice)
+	}
+}
+
+func interactivePAT(cmd *cobra.Command, reader *bufio.Reader) error {
+	fmt.Print("? GitHub PAT: ")
+	token, err := reader.ReadString('\n')
+	if err != nil {
+		return fmt.Errorf("read token: %w", err)
+	}
+	token = strings.TrimSpace(token)
+	if token == "" {
+		return fmt.Errorf("token cannot be empty")
+	}
+
+	fmt.Print("? GitHub URL (org or repo): ")
+	url, err := reader.ReadString('\n')
+	if err != nil {
+		return fmt.Errorf("read url: %w", err)
+	}
+	url = strings.TrimSpace(url)
+	if url == "" {
+		return fmt.Errorf("URL cannot be empty")
+	}
+
+	creds := &auth.Credentials{
+		Method:    "pat",
+		GitHubURL: url,
+		PAT:       token,
+	}
+
+	return validateAndSave(cmd, creds)
+}
+
+func interactiveApp(cmd *cobra.Command, reader *bufio.Reader) error {
+	fmt.Print("? GitHub App Client ID: ")
+	clientID, err := reader.ReadString('\n')
+	if err != nil {
+		return fmt.Errorf("read client ID: %w", err)
+	}
+	clientID = strings.TrimSpace(clientID)
+	if clientID == "" {
+		return fmt.Errorf("client ID cannot be empty")
+	}
+
+	fmt.Print("? Installation ID: ")
+	installIDStr, err := reader.ReadString('\n')
+	if err != nil {
+		return fmt.Errorf("read installation ID: %w", err)
+	}
+	installID, err := strconv.ParseInt(strings.TrimSpace(installIDStr), 10, 64)
+	if err != nil {
+		return fmt.Errorf("parse installation ID: %w", err)
+	}
+
+	fmt.Print("? Private key path (.pem): ")
+	keyPath, err := reader.ReadString('\n')
+	if err != nil {
+		return fmt.Errorf("read private key path: %w", err)
+	}
+	keyPath = strings.TrimSpace(keyPath)
+	if keyPath == "" {
+		return fmt.Errorf("private key path cannot be empty")
+	}
+
+	fmt.Print("? GitHub URL: ")
+	url, err := reader.ReadString('\n')
+	if err != nil {
+		return fmt.Errorf("read url: %w", err)
+	}
+	url = strings.TrimSpace(url)
+	if url == "" {
+		return fmt.Errorf("URL cannot be empty")
+	}
+
+	creds := &auth.Credentials{
+		Method:    "github_app",
+		GitHubURL: url,
+		GitHubApp: &auth.GitHubAppCreds{
+			ClientID:       clientID,
+			InstallationID: installID,
+			PrivateKeyPath: keyPath,
+		},
+	}
+
+	return validateAndSave(cmd, creds)
+}
diff --git a/internal/cli/logout.go b/internal/cli/logout.go
new file mode 100644
index 0000000..bd2eb24
--- /dev/null
+++ b/internal/cli/logout.go
@@ -0,0 +1,22 @@
+package cli
+
+import (
+	"fmt"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/auth"
+	"github.com/spf13/cobra"
+)
+
+func newLogoutCmd() *cobra.Command {
+	return &cobra.Command{
+		Use:   "logout",
+		Short: "Remove saved credentials",
+		RunE: func(_ *cobra.Command, _ []string) error {
+			if err := auth.Remove(); err != nil {
+				return err
+			}
+			fmt.Println("Credentials removed")
+			return nil
+		},
+	}
+}
diff --git a/internal/cli/purge.go b/internal/cli/purge.go
new file mode 100644
index 0000000..916d641
--- /dev/null
+++ b/internal/cli/purge.go
@@ -0,0 +1,181 @@
+package cli
+
+import (
+	"context"
+	"fmt"
+	"os"
+	"path/filepath"
+	"syscall"
+	"time"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/auth"
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/config"
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/github"
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/launchd"
+	"github.com/spf13/cobra"
+)
+
+func newPurgeCmd() *cobra.Command {
+	cmd := &cobra.Command{
+		Use:   "purge",
+		Short: "Stop everything, delete scale sets, clean workdirs",
+		RunE:  runPurge,
+	}
+
+	cmd.Flags().Duration("timeout", 5*time.Minute, "max wait for busy runners")
+	cmd.Flags().Bool("force", false, "don't wait for busy runners")
+
+	return cmd
+}
+
+func runPurge(cmd *cobra.Command, _ []string) error {
+	if cfgFile == "" {
+		return fmt.Errorf("--config is required")
+	}
+
+	timeout, err := cmd.Flags().GetDuration("timeout")
+	if err != nil {
+		return fmt.Errorf("get timeout flag: %w", err)
+	}
+
+	force, err := cmd.Flags().GetBool("force")
+	if err != nil {
+		return fmt.Errorf("get force flag: %w", err)
+	}
+
+	stopDaemonIfRunning()
+
+	cfg, err := config.Load(cfgFile)
+	if err != nil {
+		return fmt.Errorf("load config: %w", err)
+	}
+
+	creds, _, err := auth.Load(auth.LoadOpts{TokenFlag: tokenFlag})
+	if err != nil {
+		return fmt.Errorf("load credentials: %w", err)
+	}
+
+	githubURL, err := resolveGitHubURL(creds, cfg)
+	if err != nil {
+		return err
+	}
+
+	ghClient, err := github.NewClient(creds, githubURL)
+	if err != nil {
+		return fmt.Errorf("create github client: %w", err)
+	}
+
+	ctx := context.Background()
+	deletedSets := purgeScaleSets(ctx, ghClient, cfg, force, timeout)
+	cleanedDirs := cleanupWorkdirs(cfg.Runner.WorkdirBase)
+	cleanupStateFiles(cfg.Daemon.StateDir)
+
+	fmt.Printf("purge complete: deleted %d scale sets, cleaned %d workdirs\n", deletedSets, cleanedDirs)
+	return nil
+}
+
+func stopDaemonIfRunning() {
+	label := launchd.DefaultLabel()
+	pid, running := launchd.Status(label)
+	if !running {
+		return
+	}
+
+	fmt.Printf("stopping running daemon (pid=%d)...\n", pid)
+	sigErr := syscall.Kill(pid, syscall.SIGTERM)
+	if sigErr != nil {
+		fmt.Printf("  stop warning: %v\n", sigErr)
+	} else {
+		waitForExit(pid, 30*time.Second)
+	}
+
+	uninstallErr := launchd.Uninstall(label)
+	if uninstallErr != nil {
+		fmt.Printf("  uninstall warning: %v\n", uninstallErr)
+	}
+}
+
+func purgeScaleSets(ctx context.Context, ghClient *github.Client, cfg *config.Config, force bool, timeout time.Duration) int {
+	deletedSets := 0
+	for _, g := range cfg.Groups {
+		fmt.Printf("purging scale set %q...\n", g.Name)
+		ss, getErr := ghClient.GetScaleSet(ctx, 1, g.Name)
+		if getErr != nil {
+			fmt.Printf("  scale set %q not found, skipping\n", g.Name)
+			continue
+		}
+		if ss == nil {
+			continue
+		}
+
+		if !force {
+			waitForIdleRunners(ctx, ghClient, ss.ID, g.Name, timeout)
+		}
+
+		if delErr := ghClient.DeleteScaleSet(ctx, ss.ID); delErr != nil {
+			fmt.Printf("  failed to delete scale set %q: %v\n", g.Name, delErr)
+			continue
+		}
+		deletedSets++
+		fmt.Printf("  deleted scale set %q (id=%d)\n", g.Name, ss.ID)
+	}
+	return deletedSets
+}
+
+func waitForIdleRunners(ctx context.Context, ghClient *github.Client, scaleSetID int, name string, timeout time.Duration) {
+	deadline := time.Now().Add(timeout)
+	pollInterval := 5 * time.Second
+
+	for time.Now().Before(deadline) {
+		ss, err := ghClient.GetScaleSetByID(ctx, scaleSetID)
+		if err != nil {
+			fmt.Printf("  warning: cannot check scale set %q status: %v\n", name, err)
+			return
+		}
+
+		if ss.Statistics == nil || ss.Statistics.TotalBusyRunners == 0 {
+			return
+		}
+
+		fmt.Printf("  waiting for %d busy runners in %q...\n", ss.Statistics.TotalBusyRunners, name)
+
+		select {
+		case <-ctx.Done():
+			return
+		case <-time.After(pollInterval):
+		}
+	}
+
+	fmt.Printf("  timeout waiting for idle runners in %q, proceeding with delete\n", name)
+}
+
+func cleanupWorkdirs(workdirBase string) int {
+	entries, err := os.ReadDir(workdirBase)
+	if err != nil {
+		return 0
+	}
+
+	count := 0
+	for _, e := range entries {
+		if !e.IsDir() {
+			continue
+		}
+		p := filepath.Join(workdirBase, e.Name())
+		if rmErr := os.RemoveAll(p); rmErr != nil {
+			fmt.Printf("  failed to remove workdir %s: %v\n", p, rmErr)
+			continue
+		}
+		count++
+	}
+	return count
+}
+
+func cleanupStateFiles(stateDir string) {
+	for _, name := range []string{"daemon.pid", "daemon.state.json", "ghr.sock"} {
+		p := filepath.Join(stateDir, name)
+		rmErr := os.Remove(p)
+		if rmErr != nil && !os.IsNotExist(rmErr) {
+			fmt.Printf("  failed to remove %s: %v\n", p, rmErr)
+		}
+	}
+}
diff --git a/internal/cli/restart.go b/internal/cli/restart.go
new file mode 100644
index 0000000..64d61bf
--- /dev/null
+++ b/internal/cli/restart.go
@@ -0,0 +1,46 @@
+package cli
+
+import (
+	"fmt"
+	"syscall"
+	"time"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/launchd"
+	"github.com/spf13/cobra"
+)
+
+func newRestartCmd() *cobra.Command {
+	cmd := &cobra.Command{
+		Use:   "restart",
+		Short: "Restart the ghr daemon",
+		RunE:  runRestart,
+	}
+	return cmd
+}
+
+func runRestart(cmd *cobra.Command, args []string) error {
+	if cfgFile == "" {
+		stateDir := resolveStateDir()
+		if state, err := readDaemonState(stateDir); err == nil && state.ConfigPath != "" {
+			cfgFile = state.ConfigPath
+		}
+	}
+
+	label := launchd.DefaultLabel()
+	if launchd.IsRunning(label) {
+		pid, _ := launchd.Status(label)
+		fmt.Printf("stopping ghr (pid=%d)...\n", pid)
+
+		if err := syscall.Kill(pid, syscall.SIGTERM); err != nil {
+			fmt.Printf("stop warning: %v\n", err)
+		} else {
+			waitForExit(pid, 30*time.Second)
+		}
+
+		if err := launchd.Uninstall(label); err != nil {
+			fmt.Printf("uninstall warning: %v\n", err)
+		}
+	}
+
+	return runStart(cmd, args)
+}
diff --git a/internal/cli/root.go b/internal/cli/root.go
new file mode 100644
index 0000000..d9fc581
--- /dev/null
+++ b/internal/cli/root.go
@@ -0,0 +1,42 @@
+package cli
+
+import "github.com/spf13/cobra"
+
+var (
+	cfgFile   string
+	tokenFlag string
+	logLevel  string
+)
+
+func newRootCmd() *cobra.Command {
+	cmd := &cobra.Command{
+		Use:           "ghr",
+		Short:         "GitHub Actions runner controller for macOS",
+		Long:          "ghr manages ephemeral GitHub Actions runners via scale sets on macOS.",
+		SilenceUsage:  true,
+		SilenceErrors: true,
+	}
+
+	cmd.PersistentFlags().StringVar(&cfgFile, "config", "", "path to config file")
+	cmd.PersistentFlags().StringVar(&tokenFlag, "token", "", "override auth token for this invocation")
+	cmd.PersistentFlags().StringVar(&logLevel, "log-level", "", "override log level (debug/info/warn/error)")
+
+	cmd.AddCommand(
+		newStartCmd(),
+		newStopCmd(),
+		newRestartCmd(),
+		newRunCmd(),
+		newStatusCmd(),
+		newPurgeCmd(),
+		newLoginCmd(),
+		newLogoutCmd(),
+		newAuthCmd(),
+		newVersionCmd(),
+	)
+
+	return cmd
+}
+
+func Execute() error {
+	return newRootCmd().Execute()
+}
diff --git a/internal/cli/run.go b/internal/cli/run.go
new file mode 100644
index 0000000..9f0526a
--- /dev/null
+++ b/internal/cli/run.go
@@ -0,0 +1,129 @@
+package cli
+
+import (
+	"context"
+	"fmt"
+	"os/signal"
+	"syscall"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/auth"
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/config"
+	"github.com/oklog/run"
+	"github.com/spf13/cobra"
+)
+
+func newRunCmd() *cobra.Command {
+	cmd := &cobra.Command{
+		Use:   "run",
+		Short: "Run the ghr daemon in foreground",
+		RunE:  runRun,
+	}
+	return cmd
+}
+
+func runRun(_ *cobra.Command, _ []string) error {
+	if cfgFile == "" {
+		return fmt.Errorf("--config is required")
+	}
+
+	cfg, err := config.Load(cfgFile)
+	if err != nil {
+		return fmt.Errorf("load config: %w", err)
+	}
+
+	creds, source, err := auth.Load(auth.LoadOpts{TokenFlag: tokenFlag})
+	if err != nil {
+		return err
+	}
+
+	githubURL, err := resolveGitHubURL(creds, cfg)
+	if err != nil {
+		return err
+	}
+
+	if logLevel != "" {
+		cfg.Logging.Level = logLevel
+	}
+
+	d, err := buildDaemon(cfg, creds, githubURL)
+	if err != nil {
+		return err
+	}
+	defer d.logMgr.Close()
+
+	d.logger.Info("ghr starting",
+		"config", cfgFile,
+		"groups", len(cfg.Groups),
+		"auth_source", source,
+		"auth_method", creds.Method,
+	)
+
+	pidPath := pidFilePath(cfg.Daemon.StateDir)
+	if err := writePIDFile(pidPath); err != nil {
+		return fmt.Errorf("write pid file: %w", err)
+	}
+	defer removePIDFile(pidPath)
+
+	if err := writeDaemonState(cfg.Daemon.StateDir, cfgFile); err != nil {
+		return fmt.Errorf("write daemon state: %w", err)
+	}
+	defer removeDaemonState(cfg.Daemon.StateDir)
+
+	return runDaemonGroup(d)
+}
+
+func runDaemonGroup(d *daemon) error {
+	var g run.Group
+
+	{
+		ctx, cancel := context.WithCancel(context.Background())
+		g.Add(
+			func() error { return d.ctrl.Run(ctx) },
+			func(error) { cancel() },
+		)
+	}
+
+	{
+		ctx, cancel := context.WithCancel(context.Background())
+		g.Add(
+			func() error { return d.health.Run(ctx) },
+			func(error) { cancel() },
+		)
+	}
+
+	{
+		ctx, cancel := context.WithCancel(context.Background())
+		g.Add(
+			func() error { return d.api.Run(ctx) },
+			func(error) { cancel() },
+		)
+	}
+
+	{
+		ctx, cancel := context.WithCancel(context.Background())
+		g.Add(
+			func() error { return d.logMgr.StartCleanupScheduler(ctx) },
+			func(error) { cancel() },
+		)
+	}
+
+	{
+		ctx, cancel := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM)
+		g.Add(
+			func() error {
+				<-ctx.Done()
+				return nil
+			},
+			func(error) { cancel() },
+		)
+	}
+
+	groupErr := g.Run()
+
+	shutdownCtx, cancel := context.WithTimeout(context.Background(), d.cfg.Daemon.ShutdownTimeout.Duration)
+	defer cancel()
+	d.ctrl.Shutdown(shutdownCtx)
+
+	d.logger.Info("ghr stopped")
+	return groupErr
+}
diff --git a/internal/cli/start.go b/internal/cli/start.go
new file mode 100644
index 0000000..458f07f
--- /dev/null
+++ b/internal/cli/start.go
@@ -0,0 +1,111 @@
+package cli
+
+import (
+	"fmt"
+	"os"
+	"time"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/config"
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/launchd"
+	"github.com/spf13/cobra"
+)
+
+func newStartCmd() *cobra.Command {
+	cmd := &cobra.Command{
+		Use:   "start",
+		Short: "Start the ghr daemon via launchd",
+		RunE:  runStart,
+	}
+
+	cmd.Flags().Bool("foreground", false, "run in foreground (same as 'ghr run')")
+
+	return cmd
+}
+
+func runStart(cmd *cobra.Command, args []string) error {
+	foreground, err := cmd.Flags().GetBool("foreground")
+	if err == nil && foreground {
+		return runRun(cmd, args)
+	}
+
+	if cfgFile == "" {
+		return fmt.Errorf("--config is required")
+	}
+
+	cfg, err := config.Load(cfgFile)
+	if err != nil {
+		return fmt.Errorf("load config: %w", err)
+	}
+
+	label := launchd.DefaultLabel()
+	if launchd.IsRunning(label) {
+		pid, _ := launchd.Status(label)
+		fmt.Printf("ghr is already running (pid=%d)\n", pid)
+		return nil
+	}
+
+	binaryPath, err := os.Executable()
+	if err != nil {
+		return fmt.Errorf("resolve binary path: %w", err)
+	}
+
+	svcCfg := launchd.ServiceConfig{
+		Label:      label,
+		BinaryPath: binaryPath,
+		ConfigPath: cfgFile,
+		LogDir:     cfg.Logging.Dir,
+		StateDir:   cfg.Daemon.StateDir,
+	}
+
+	if err := launchd.Install(&svcCfg); err != nil {
+		return fmt.Errorf("install launchd service: %w", err)
+	}
+
+	pid := waitForPID(cfg.Daemon.StateDir, 5*time.Second)
+
+	serviceType := "LaunchAgent"
+	if os.Getuid() == 0 {
+		serviceType = "LaunchDaemon"
+	}
+
+	if pid > 0 {
+		fmt.Printf("ghr started (pid=%d)\n", pid)
+	} else {
+		fmt.Println("ghr started")
+	}
+	fmt.Printf("Service: %s (%s)\n", label, serviceType)
+	fmt.Printf("Config:  %s\n", cfgFile)
+	fmt.Printf("Groups:  %d", len(cfg.Groups))
+	if len(cfg.Groups) > 0 {
+		fmt.Print(" (")
+		for i, g := range cfg.Groups {
+			if i > 0 {
+				fmt.Print(", ")
+			}
+			fmt.Print(g.Name)
+		}
+		fmt.Print(")")
+	}
+	fmt.Println()
+	fmt.Printf("Logs:    %s\n", cfg.Logging.Dir)
+
+	return nil
+}
+
+func waitForPID(stateDir string, timeout time.Duration) int {
+	pidPath := pidFilePath(stateDir)
+	deadline := time.Now().Add(timeout)
+
+	for time.Now().Before(deadline) {
+		data, err := os.ReadFile(pidPath)
+		if err == nil && len(data) > 0 {
+			var pid int
+			if _, scanErr := fmt.Sscanf(string(data), "%d", &pid); scanErr == nil && pid > 0 {
+				return pid
+			}
+		}
+		time.Sleep(500 * time.Millisecond)
+	}
+
+	return 0
+}
diff --git a/internal/cli/state.go b/internal/cli/state.go
new file mode 100644
index 0000000..3dc7d42
--- /dev/null
+++ b/internal/cli/state.go
@@ -0,0 +1,62 @@
+package cli
+
+import (
+	"encoding/json"
+	"fmt"
+	"os"
+	"path/filepath"
+	"time"
+)
+
+const stateFileName = "daemon.state.json"
+
+type daemonState struct {
+	ConfigPath string         `json:"config_path"`
+	StartedAt  time.Time      `json:"started_at"`
+	PID        int            `json:"pid"`
+	Groups     map[string]int `json:"groups"`
+}
+
+func writeDaemonState(stateDir, configPath string) error {
+	state := daemonState{
+		ConfigPath: configPath,
+		StartedAt:  time.Now(),
+		PID:        os.Getpid(),
+		Groups:     make(map[string]int),
+	}
+
+	data, err := json.MarshalIndent(state, "", "  ")
+	if err != nil {
+		return fmt.Errorf("marshal daemon state: %w", err)
+	}
+
+	dir := stateDir
+	if err := os.MkdirAll(dir, 0o755); err != nil {
+		return fmt.Errorf("create state directory %s: %w", dir, err)
+	}
+
+	path := filepath.Join(dir, stateFileName)
+	if err := os.WriteFile(path, data, 0o644); err != nil {
+		return fmt.Errorf("write daemon state %s: %w", path, err)
+	}
+	return nil
+}
+
+func readDaemonState(stateDir string) (*daemonState, error) {
+	path := filepath.Join(stateDir, stateFileName)
+	data, err := os.ReadFile(path)
+	if err != nil {
+		return nil, fmt.Errorf("read daemon state %s: %w", path, err)
+	}
+
+	var state daemonState
+	if err := json.Unmarshal(data, &state); err != nil {
+		return nil, fmt.Errorf("parse daemon state %s: %w", path, err)
+	}
+	return &state, nil
+}
+
+func removeDaemonState(stateDir string) {
+	path := filepath.Join(stateDir, stateFileName)
+	_ = os.Remove(path)
+}
diff --git a/internal/cli/status.go b/internal/cli/status.go
new file mode 100644
index 0000000..e5af2f5
--- /dev/null
+++ b/internal/cli/status.go
@@ -0,0 +1,134 @@
+package cli
+
+import (
+	"context"
+	"fmt"
+	"io"
+	"net"
+	"net/http"
+	"os"
+	"path/filepath"
+	"time"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/config"
+	"github.com/spf13/cobra"
+)
+
+func newStatusCmd() *cobra.Command {
+	cmd := &cobra.Command{
+		Use:   "status",
+		Short: "Show ghr daemon status",
+		RunE:  runStatus,
+	}
+
+	cmd.Flags().Bool("json", false, "output in JSON format")
+	cmd.Flags().Bool("watch", false, "live refresh mode")
+	cmd.Flags().Duration("interval", 5*time.Second, "refresh interval for --watch")
+
+	return cmd
+}
+
+func runStatus(cmd *cobra.Command, _ []string) error {
+	jsonOutput, err := cmd.Flags().GetBool("json")
+	if err != nil {
+		return fmt.Errorf("get json flag: %w", err)
+	}
+
+	watch, err := cmd.Flags().GetBool("watch")
+	if err != nil {
+		return fmt.Errorf("get watch flag: %w", err)
+	}
+
+	interval, err := cmd.Flags().GetDuration("interval")
+	if err != nil {
+		return fmt.Errorf("get interval flag: %w", err)
+	}
+
+	stateDir := resolveStateDir()
+	socketPath := filepath.Join(stateDir, "ghr.sock")
+
+	if !watch {
+		return renderOnce(socketPath, stateDir, jsonOutput)
+	}
+
+	return runWatch(cmd.Context(), socketPath, stateDir, jsonOutput, interval)
+}
+
+func renderOnce(socketPath, stateDir string, jsonOutput bool) error {
+	resp, socketErr := querySocket(socketPath, "/status")
+	if socketErr != nil {
+		return showOfflineStatus(stateDir, jsonOutput)
+	}
+
+	if jsonOutput {
+		fmt.Println(string(resp))
+		return nil
+	}
+
+	return displayStatus(resp)
+}
+
+func runWatch(ctx context.Context, socketPath, stateDir string, jsonOutput bool, interval time.Duration) error {
+	ticker := time.NewTicker(interval)
+	defer ticker.Stop()
+
+	for {
+		if !jsonOutput {
+			fmt.Print("\033[H\033[2J")
+		}
+
+		renderErr := renderOnce(socketPath, stateDir, jsonOutput)
+		if renderErr != nil && !jsonOutput {
+			fmt.Fprintf(os.Stderr, "status error: %v\n", renderErr)
+		}
+
+		select {
+		case <-ctx.Done():
+			return nil
+		case <-ticker.C:
+		}
+	}
+}
+
+func resolveStateDir() string {
+	if cfgFile != "" {
+		cfg, err := config.Load(cfgFile)
+		if err == nil {
+			return cfg.Daemon.StateDir
+		}
+	}
+
+	if os.Getuid() == 0 {
+		return "/var/lib/ghr/state"
+	}
+
+	home, err := os.UserHomeDir()
+	if err != nil {
+		return "."
+	}
+	return filepath.Join(home, ".local", "state", "ghr")
+}
+
+func querySocket(socketPath, endpoint string) ([]byte, error) {
+	client := &http.Client{
+		Transport: &http.Transport{
+			DialContext: func(_ context.Context, _, _ string) (net.Conn, error) {
+				return net.Dial("unix", socketPath)
+			},
+		},
+		Timeout: 5 * time.Second,
+	}
+
+	resp, err := client.Get("http://unix" + endpoint)
+	if err != nil {
+		return nil, fmt.Errorf("connect to daemon socket: %w", err)
+	}
+	defer resp.Body.Close()
+
+	body, err := io.ReadAll(resp.Body)
+	if err != nil {
+		return nil, fmt.Errorf("read socket response: %w", err)
+	}
+
+	return body, nil
+}
diff --git a/internal/cli/status_render.go b/internal/cli/status_render.go
new file mode 100644
index 0000000..95da2ff
--- /dev/null
+++ b/internal/cli/status_render.go
@@ -0,0 +1,169 @@
+package cli
+
+import (
+	"encoding/json"
+	"fmt"
+	"time"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/launchd"
+)
+
+type statusResponse struct {
+	Groups map[string][]statusRunner `json:"groups"`
+	Health statusHealth              `json:"health"`
+}
+
+type statusRunner struct {
+	Name    string `json:"name"`
+	State   string `json:"state"`
+	PID     int    `json:"pid"`
+	JobName string `json:"job_name"`
+}
+
+type statusHealthIssue struct {
+	Level   string `json:"level"`
+	Type    string `json:"type"`
+	Group   string `json:"group"`
+	Runner  string `json:"runner"`
+	Message string `json:"message"`
+}
+
+type statusHealth struct {
+	LastCheck string              `json:"last_check"`
+	Issues    []statusHealthIssue `json:"issues"`
+}
+
+func showOfflineStatus(stateDir string, jsonOutput bool) error {
+	label := launchd.DefaultLabel()
+	pid, running := launchd.Status(label)
+
+	if jsonOutput {
+		status := map[string]interface{}{
+			"status":  "stopped",
+			"running": running,
+			"pid":     pid,
+		}
+		data, err := json.MarshalIndent(status, "", "  ")
+		if err != nil {
+			return fmt.Errorf("marshal status: %w", err)
+		}
+		fmt.Println(string(data))
+		return nil
+	}
+
+	fmt.Println("Service")
+	if running {
+		fmt.Printf("  Status:    running (via launchd, pid=%d)\n", pid)
+		fmt.Println("  Note:      daemon socket not available")
+	} else {
+		fmt.Println("  Status:    stopped")
+	}
+
+	if state, readErr := readDaemonState(stateDir); readErr == nil {
+		fmt.Printf("  Config:    %s\n", state.ConfigPath)
+		fmt.Printf("  Started:   %s\n", state.StartedAt.Format(time.RFC3339))
+	}
+
+	fmt.Println()
+	fmt.Println("No active groups or runners.")
+	fmt.Println("Use 'ghr start' to start the daemon.")
+
+	return nil
+}
+
+func displayStatus(data []byte) error {
+	var status statusResponse
+	if err := json.Unmarshal(data, &status); err != nil {
+		return fmt.Errorf("parse status response: %w", err)
+	}
+
+	label := launchd.DefaultLabel()
+	pid, _ := launchd.Status(label)
+
+	renderServiceSection(pid, "")
+	renderGroupsTable(status.Groups)
+	renderRunnersTable(status.Groups)
+	renderHealthSection(status.Health)
+
+	return nil
+}
+
+func renderServiceSection(pid int, configPath string) {
+	fmt.Println("Service")
+	fmt.Println("  Status:    running")
+	if pid > 0 {
+		fmt.Printf("  PID:       %d\n", pid)
+	}
+	if configPath != "" {
+		fmt.Printf("  Config:    %s\n", configPath)
+	}
+	fmt.Println()
+}
+
+func renderGroupsTable(groups map[string][]statusRunner) {
+	fmt.Println("Groups")
+	fmt.Printf("  %-20s %5s %7s %5s %8s\n", "Name", "Max", "Running", "Idle", "Health")
+	fmt.Printf("  %-20s %5s %7s %5s %8s\n", "----", "---", "-------", "----", "------")
+
+	totalRunning := 0
+	totalIdle := 0
+
+	for group, runners := range groups {
+		running := 0
+		idle := 0
+		for _, r := range runners {
+			if r.State == "busy" {
+				running++
+			} else {
+				idle++
+			}
+		}
+		totalRunning += running
+		totalIdle += idle
+		fmt.Printf("  %-20s %5d %7d %5d %8s\n", group, len(runners), running, idle, "OK")
+	}
+
+	fmt.Printf("  Total: running=%d  idle=%d\n", totalRunning, totalIdle)
+	fmt.Println()
+}
+
+func renderRunnersTable(groups map[string][]statusRunner) {
+	hasRunners := false
+	for _, runners := range groups {
+		if len(runners) > 0 {
+			hasRunners = true
+			break
+		}
+	}
+	if !hasRunners {
+		return
+	}
+
+	fmt.Println("Runners")
+	fmt.Printf("  %-30s %-8s %-25s %6s\n", "Runner", "Status", "Job", "PID")
+	fmt.Printf("  %-30s %-8s %-25s %6s\n", "------", "------", "---", "---")
+
+	for _, runners := range groups {
+		for _, r := range runners {
+			job := r.JobName
+			if job == "" {
+				job = "-"
+			}
+			fmt.Printf("  %-30s %-8s %-25s %6d\n", r.Name, r.State, job, r.PID)
+		}
+	}
+	fmt.Println()
+}
+
+func renderHealthSection(h statusHealth) {
+	fmt.Println("Health")
+	if h.LastCheck != "" {
+		fmt.Printf("  Last check:  %s\n", h.LastCheck)
+	} else {
+		fmt.Println("  Last check:  n/a")
+	}
+	fmt.Printf("  Issues:      %d\n", len(h.Issues))
+	for _, issue := range h.Issues {
+		fmt.Printf("    [%s] %s: %s\n", issue.Level, issue.Type, issue.Message)
+	}
+}
diff --git a/internal/cli/stop.go b/internal/cli/stop.go
new file mode 100644
index 0000000..a14e433
--- /dev/null
+++ b/internal/cli/stop.go
@@ -0,0 +1,78 @@
+package cli
+
+import (
+	"fmt"
+	"syscall"
+	"time"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/launchd"
+	"github.com/spf13/cobra"
+)
+
+func newStopCmd() *cobra.Command {
+	cmd := &cobra.Command{
+		Use:   "stop",
+		Short: "Stop the ghr daemon",
+		RunE:  runStop,
+	}
+
+	cmd.Flags().Duration("timeout", 30*time.Second, "max wait for graceful shutdown")
+	cmd.Flags().Bool("force", false, "skip SIGTERM, send SIGKILL immediately")
+
+	return cmd
+}
+
+func runStop(cmd *cobra.Command, _ []string) error {
+	timeout, err := cmd.Flags().GetDuration("timeout")
+	if err != nil {
+		return fmt.Errorf("get timeout flag: %w", err)
+	}
+
+	force, err := cmd.Flags().GetBool("force")
+	if err != nil {
+		return fmt.Errorf("get force flag: %w", err)
+	}
+
+	label := launchd.DefaultLabel()
+	pid, running := launchd.Status(label)
+	if !running {
+		fmt.Println("ghr is not running")
+		return nil
+	}
+
+	if force {
+		if err := syscall.Kill(pid, syscall.SIGKILL); err != nil {
+			return fmt.Errorf("send SIGKILL to pid %d: %w", pid, err)
+		}
+	} else {
+		if err := syscall.Kill(pid, syscall.SIGTERM); err != nil {
+			return fmt.Errorf("send SIGTERM to pid %d: %w", pid, err)
+		}
+
+		if !waitForExit(pid, timeout) {
+			fmt.Println("graceful shutdown timed out, sending SIGKILL")
+			if err := syscall.Kill(pid, syscall.SIGKILL); err != nil {
+				return fmt.Errorf("send SIGKILL to pid %d: %w", pid, err)
+			}
+		}
+	}
+
+	uninstallErr := launchd.Uninstall(label)
+	if uninstallErr != nil {
+		return fmt.Errorf("uninstall launchd service: %w", uninstallErr)
+	}
+
+	fmt.Println("ghr stopped")
+	return nil
+}
+
+func waitForExit(pid int, timeout time.Duration) bool {
+	deadline := time.Now().Add(timeout)
+	for time.Now().Before(deadline) {
+		if err := syscall.Kill(pid, 0); err != nil {
+			return true
+		}
+		time.Sleep(500 * time.Millisecond)
+	}
+	return false
+}
diff --git a/internal/cli/version.go b/internal/cli/version.go
new file mode 100644
index 0000000..abb3c91
--- /dev/null
+++ b/internal/cli/version.go
@@ -0,0 +1,23 @@
+package cli
+
+import (
+	"fmt"
+
+	"github.com/spf13/cobra"
+)
+
+var (
+	version = "dev"
+	commit  = "none"
+	date    = "unknown"
+)
+
+func newVersionCmd() *cobra.Command {
+	return &cobra.Command{
+		Use:   "version",
+		Short: "Print version information",
+		Run: func(_ *cobra.Command, _ []string) {
+			fmt.Printf("ghr %s (commit: %s, built: %s)\n", version, commit, date)
+		},
+	}
+}
diff --git a/internal/config/bytesize.go b/internal/config/bytesize.go
new file mode 100644
index 0000000..727a116
--- /dev/null
+++ b/internal/config/bytesize.go
@@ -0,0 +1,61 @@
+package config
+
+import (
+	"fmt"
+	"strconv"
+	"strings"
+)
+
+const (
+	bytesPerKB int64 = 1000
+	bytesPerMB int64 = 1000 * 1000
+	bytesPerGB int64 = 1000 * 1000 * 1000
+	bytesPerTB int64 = 1000 * 1000 * 1000 * 1000
+)
+
+func ParseByteSize(s string) (int64, error) {
+	s = strings.TrimSpace(s)
+	if s == "" {
+		return 0, fmt.Errorf("parse byte size: empty string")
+	}
+
+	upper := strings.ToUpper(s)
+
+	suffixes := []struct {
+		suffix     string
+		multiplier int64
+	}{
+		{"TB", bytesPerTB},
+		{"GB", bytesPerGB},
+		{"MB", bytesPerMB},
+		{"KB", bytesPerKB},
+		{"B", 1},
+	}
+
+	for _, entry := range suffixes {
+		if !strings.HasSuffix(upper, entry.suffix) {
+			continue
+		}
+		numStr := strings.TrimSpace(s[:len(s)-len(entry.suffix)])
+		if numStr == "" {
+			return 0, fmt.Errorf("parse byte size %q: missing numeric value", s)
+		}
+		n, err := strconv.ParseFloat(numStr, 64)
+		if err != nil {
+			return 0, fmt.Errorf("parse byte size %q: %w", s, err)
+		}
+		if n < 0 {
+			return 0, fmt.Errorf("parse byte size %q: negative value", s)
+		}
+		return int64(n * float64(entry.multiplier)), nil
+	}
+
+	n, err := strconv.ParseInt(s, 10, 64)
+	if err != nil {
+		return 0, fmt.Errorf("parse byte size %q: %w", s, err)
+	}
+	if n < 0 {
+		return 0, fmt.Errorf("parse byte size %q: negative value", s)
+	}
+	return n, nil
+}
diff --git a/internal/config/bytesize_test.go b/internal/config/bytesize_test.go
new file mode 100644
index 0000000..4abdb78
--- /dev/null
+++ b/internal/config/bytesize_test.go
@@ -0,0 +1,71 @@
+package config
+
+import (
+	"testing"
+)
+
+func TestParseByteSize(t *testing.T) {
+	tests := []struct {
+		name    string
+		input   string
+		want    int64
+		wantErr bool
+	}{
+		// Raw byte values.
+		{name: "numeric only", input: "1024", want: 1024},
+		{name: "zero", input: "0", want: 0},
+		{name: "explicit B suffix", input: "1B", want: 1},
+		{name: "large bytes", input: "999999", want: 999999},
+
+		// KB (1000-based).
+		{name: "1KB", input: "1KB", want: 1000},
+		{name: "lowercase kb", input: "1kb", want: 1000},
+		{name: "mixed case Kb", input: "1Kb", want: 1000},
+		{name: "500KB", input: "500KB", want: 500_000},
+
+		// MB.
+		{name: "1MB", input: "1MB", want: 1_000_000},
+		{name: "500MB", input: "500MB", want: 500_000_000},
+
+		// GB.
+		{name: "1GB", input: "1GB", want: 1_000_000_000},
+		{name: "10GB", input: "10GB", want: 10_000_000_000},
+
+		// TB.
+		{name: "1TB", input: "1TB", want: 1_000_000_000_000},
+		{name: "2TB", input: "2TB", want: 2_000_000_000_000},
+
+		// Fractional values (supported for suffixed inputs via ParseFloat).
+		{name: "1.5GB", input: "1.5GB", want: 1_500_000_000},
+		{name: "0.5MB", input: "0.5MB", want: 500_000},
+
+		// Whitespace handling.
+		{name: "leading/trailing spaces", input: "  100MB  ", want: 100_000_000},
+
+		// Error cases.
+		{name: "empty string", input: "", wantErr: true},
+		{name: "pure alpha", input: "abc", wantErr: true},
+		{name: "negative GB", input: "-1GB", wantErr: true},
+		{name: "negative raw", input: "-100", wantErr: true},
+		{name: "suffix only KB", input: "KB", wantErr: true},
+		{name: "suffix only B", input: "B", wantErr: true},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			got, err := ParseByteSize(tt.input)
+			if tt.wantErr {
+				if err == nil {
+					t.Errorf("ParseByteSize(%q) = %d, want error", tt.input, got)
+				}
+				return
+			}
+			if err != nil {
+				t.Fatalf("ParseByteSize(%q) unexpected error: %v", tt.input, err)
+			}
+			if got != tt.want {
+				t.Errorf("ParseByteSize(%q) = %d, want %d", tt.input, got, tt.want)
+			}
+		})
+	}
+}
diff --git a/internal/config/loader.go b/internal/config/loader.go
new file mode 100644
index 0000000..1d69109
--- /dev/null
+++ b/internal/config/loader.go
@@ -0,0 +1,148 @@
+package config
+
+import (
+	"fmt"
+	"os"
+	"path/filepath"
+	"strings"
+	"time"
+
+	"github.com/joho/godotenv"
+	"gopkg.in/yaml.v3"
+)
+
+func Load(path string) (*Config, error) {
+	_ = godotenv.Load()
+	configDir := filepath.Dir(path)
+	_ = godotenv.Load(filepath.Join(configDir, ".env"))
+
+	data, err := os.ReadFile(path)
+	if err != nil {
+		return nil, fmt.Errorf("read config file %q: %w", path, err)
+	}
+
+	cfg := &Config{}
+	if err := yaml.Unmarshal(data, cfg); err != nil {
+		return nil, fmt.Errorf("parse config file %q: %w", path, err)
+	}
+
+	applyDefaults(cfg)
+	resolveEnvVars(cfg)
+
+	if err := validate(cfg); err != nil {
+		return nil, fmt.Errorf("validate config: %w", err)
+	}
+
+	return cfg, nil
+}
+
+func applyDefaults(cfg *Config) {
+	isRoot := os.Getuid() == 0
+
+	var dataDir, logDir, stateDir string
+	if isRoot {
+		dataDir = "/var/lib/ghr"
+		logDir = "/var/log/ghr"
+		stateDir = "/var/lib/ghr/state"
+	} else {
+		home, err := os.UserHomeDir()
+		if err != nil {
+			home = "."
+		}
+		dataDir = filepath.Join(home, ".local", "share", "ghr")
+		logDir = filepath.Join(home, ".local", "share", "ghr", "logs")
+		stateDir = filepath.Join(home, ".local", "state", "ghr")
+	}
+
+	if cfg.GitHub.RunnerGroup == "" {
+		cfg.GitHub.RunnerGroup = "default"
+	}
+
+	if cfg.Runner.Version == "" {
+		cfg.Runner.Version = "latest"
+	}
+	if cfg.Runner.CacheDir == "" {
+		cfg.Runner.CacheDir = filepath.Join(dataDir, "cache")
+	}
+	if cfg.Runner.WorkdirBase == "" {
+		cfg.Runner.WorkdirBase = filepath.Join(dataDir, "runners")
+	}
+
+	if isHealthZero(cfg.Health) {
+		cfg.Health.Enabled = true
+	}
+	if cfg.Health.CheckInterval.Duration == 0 {
+		cfg.Health.CheckInterval = Duration{30 * time.Second}
+	}
+	if cfg.Health.RunnerTimeout.Duration == 0 {
+		cfg.Health.RunnerTimeout = Duration{2 * time.Hour}
+	}
+	if cfg.Health.DivergenceTimeout.Duration == 0 {
+		cfg.Health.DivergenceTimeout = Duration{5 * time.Minute}
+	}
+	if cfg.Health.MaxConsecutiveFailures == 0 {
+		cfg.Health.MaxConsecutiveFailures = 5
+	}
+	if cfg.Health.FailureCooldown.Duration == 0 {
+		cfg.Health.FailureCooldown = Duration{1 * time.Minute}
+	}
+	if cfg.Health.MinDiskSpace == "" {
+		cfg.Health.MinDiskSpace = "1GB"
+	}
+
+	if cfg.Logging.Level == "" {
+		cfg.Logging.Level = "info"
+	}
+	if cfg.Logging.Format == "" {
+		cfg.Logging.Format = "text"
+	}
+	if cfg.Logging.Dir == "" {
+		cfg.Logging.Dir = logDir
+	}
+	if cfg.Logging.RetentionDays == 0 {
+		cfg.Logging.RetentionDays = 30
+	}
+	if cfg.Logging.RunnerOutput == nil {
+		t := true
+		cfg.Logging.RunnerOutput = &t
+	}
+
+	if cfg.Notifications.Discord.Username == "" {
+		cfg.Notifications.Discord.Username = "ghr"
+	}
+
+	if cfg.Daemon.StateDir == "" {
+		cfg.Daemon.StateDir = stateDir
+	}
+	if cfg.Daemon.ShutdownTimeout.Duration == 0 {
+		cfg.Daemon.ShutdownTimeout = Duration{30 * time.Second}
+	}
+}
+
+func resolveEnvVars(cfg *Config) {
+	if v := os.Getenv("GHR_DISCORD_WEBHOOK_URL"); v != "" {
+		cfg.Notifications.Discord.WebhookURL = v
+	}
+	if v := os.Getenv("GHR_UPTIME_KUMA_URL"); v != "" {
+		cfg.Monitoring.UptimeKuma.BaseURL = v
+	}
+	if v := os.Getenv("GHR_UPTIME_KUMA_DAEMON_TOKEN"); v != "" {
+		cfg.Monitoring.UptimeKuma.DaemonToken = v
+	}
+	resolveUptimeKumaGroupTokens(cfg)
+}
+
+func resolveUptimeKumaGroupTokens(cfg *Config) {
+	if len(cfg.Groups) == 0 {
+		return
+	}
+	if cfg.Monitoring.UptimeKuma.GroupTokens == nil {
+		cfg.Monitoring.UptimeKuma.GroupTokens = make(map[string]string, len(cfg.Groups))
+	}
+	for _, g := range cfg.Groups {
+		envKey := "GHR_UPTIME_KUMA_TOKEN_" + strings.ToUpper(strings.ReplaceAll(g.Name, "-", "_"))
+		if v := os.Getenv(envKey); v != "" {
+			cfg.Monitoring.UptimeKuma.GroupTokens[g.Name] = v
+		}
+	}
+}
diff --git a/internal/config/loader_test.go b/internal/config/loader_test.go
new file mode 100644
index 0000000..e6f19ee
--- /dev/null
+++ b/internal/config/loader_test.go
@@ -0,0 +1,563 @@
+package config
+
+import (
+	"os"
+	"path/filepath"
+	"strings"
+	"testing"
+	"time"
+)
+
+// writeConfig writes a YAML string to a temp file and returns its path.
+func writeConfig(t *testing.T, yaml string) string {
+	t.Helper()
+	dir := t.TempDir()
+	path := filepath.Join(dir, "config.yaml")
+	if err := os.WriteFile(path, []byte(yaml), 0644); err != nil {
+		t.Fatalf("write temp config: %v", err)
+	}
+	return path
+}
+
+func TestLoad_MinimalConfig(t *testing.T) {
+	yaml := `
+groups:
+  - name: test-group
+    max_runners: 2
+`
+	cfg, err := Load(writeConfig(t, yaml))
+	if err != nil {
+		t.Fatalf("Load() unexpected error: %v", err)
+	}
+
+	// Group values.
+	if len(cfg.Groups) != 1 {
+		t.Fatalf("expected 1 group, got %d", len(cfg.Groups))
+	}
+	g := cfg.Groups[0]
+	if g.Name != "test-group" {
+		t.Errorf("group name = %q, want %q", g.Name, "test-group")
+	}
+	if g.MaxRunners != 2 {
+		t.Errorf("max_runners = %d, want 2", g.MaxRunners)
+	}
+
+	// Defaults: runner.
+	if cfg.Runner.Version != "latest" {
+		t.Errorf("runner.version = %q, want %q", cfg.Runner.Version, "latest")
+	}
+
+	// Defaults: github.
+	if cfg.GitHub.RunnerGroup != "default" {
+		t.Errorf("github.runner_group = %q, want %q", cfg.GitHub.RunnerGroup, "default")
+	}
+
+	// Defaults: health.
+	if !cfg.Health.Enabled {
+		t.Error("health.enabled = false, want true (default)")
+	}
+	if cfg.Health.CheckInterval.Duration != 30*time.Second {
+		t.Errorf("health.check_interval = %v, want 30s", cfg.Health.CheckInterval.Duration)
+	}
+	if cfg.Health.RunnerTimeout.Duration != 2*time.Hour {
+		t.Errorf("health.runner_timeout = %v, want 2h", cfg.Health.RunnerTimeout.Duration)
+	}
+	if cfg.Health.DivergenceTimeout.Duration != 5*time.Minute {
+		t.Errorf("health.divergence_timeout = %v, want 5m", cfg.Health.DivergenceTimeout.Duration)
+	}
+	if cfg.Health.MaxConsecutiveFailures != 5 {
+		t.Errorf("health.max_consecutive_failures = %d, want 5", cfg.Health.MaxConsecutiveFailures)
+	}
+	if cfg.Health.FailureCooldown.Duration != 1*time.Minute {
+		t.Errorf("health.failure_cooldown = %v, want 1m", cfg.Health.FailureCooldown.Duration)
+	}
+	if cfg.Health.MinDiskSpace != "1GB" {
+		t.Errorf("health.min_disk_space = %q, want %q", cfg.Health.MinDiskSpace, "1GB")
+	}
+
+	// Defaults: logging.
+	if cfg.Logging.Level != "info" {
+		t.Errorf("logging.level = %q, want %q", cfg.Logging.Level, "info")
+	}
+	if cfg.Logging.Format != "text" {
+		t.Errorf("logging.format = %q, want %q", cfg.Logging.Format, "text")
+	}
+	if cfg.Logging.RetentionDays != 30 {
+		t.Errorf("logging.retention_days = %d, want 30", cfg.Logging.RetentionDays)
+	}
+	if cfg.Logging.RunnerOutput == nil || !*cfg.Logging.RunnerOutput {
+		t.Error("logging.runner_output = false/nil, want true (default)")
+	}
+
+	// Defaults: notifications.
+	if cfg.Notifications.Discord.Username != "ghr" {
+		t.Errorf("notifications.discord.username = %q, want %q", cfg.Notifications.Discord.Username, "ghr")
+	}
+
+	// Defaults: daemon.
+	if cfg.Daemon.ShutdownTimeout.Duration != 30*time.Second {
+		t.Errorf("daemon.shutdown_timeout = %v, want 30s", cfg.Daemon.ShutdownTimeout.Duration)
+	}
+}
+
+func TestLoad_FullConfig(t *testing.T) {
+	yaml := `
+github:
+  url: "https://github.example.com"
+  runner_group: "custom-group"
+
+runner:
+  version: "2.320.0"
+  cache_dir: "/tmp/ghr-cache"
+  workdir_base: "/tmp/ghr-runners"
+
+groups:
+  - name: production
+    max_runners: 10
+    min_runners: 2
+    labels:
+      - self-hosted
+      - linux
+    runner_group: "prod-pool"
+    version: "2.319.0"
+  - name: staging
+    max_runners: 5
+    min_runners: 0
+    labels:
+      - staging
+
+health:
+  enabled: true
+  check_interval: "1m"
+  runner_timeout: "3h"
+  idle_timeout: "30m"
+  divergence_timeout: "10m"
+  max_consecutive_failures: 10
+  failure_cooldown: "2m"
+  min_disk_space: "5GB"
+
+logging:
+  level: "debug"
+  format: "json"
+  dir: "/tmp/ghr-logs"
+  retention_days: 14
+  runner_output: false
+
+notifications:
+  discord:
+    enabled: true
+    events:
+      - runner.started
+      - runner.failed
+    username: "my-bot"
+    mentions:
+      error: "<@&111>"
+      critical: "<@&222>"
+
+monitoring:
+  uptime_kuma:
+    enabled: true
+    degraded_threshold: 0.8
+    report_health_as_ping: true
+
+daemon:
+  state_dir: "/tmp/ghr-state"
+  shutdown_timeout: "1m"
+`
+	cfg, err := Load(writeConfig(t, yaml))
+	if err != nil {
+		t.Fatalf("Load() unexpected error: %v", err)
+	}
+
+	// GitHub.
+	if cfg.GitHub.URL != "https://github.example.com" {
+		t.Errorf("github.url = %q, want %q", cfg.GitHub.URL, "https://github.example.com")
+	}
+	if cfg.GitHub.RunnerGroup != "custom-group" {
+		t.Errorf("github.runner_group = %q, want %q", cfg.GitHub.RunnerGroup, "custom-group")
+	}
+
+	// Runner.
+	if cfg.Runner.Version != "2.320.0" {
+		t.Errorf("runner.version = %q, want %q", cfg.Runner.Version, "2.320.0")
+	}
+	if cfg.Runner.CacheDir != "/tmp/ghr-cache" {
+		t.Errorf("runner.cache_dir = %q, want %q", cfg.Runner.CacheDir, "/tmp/ghr-cache")
+	}
+	if cfg.Runner.WorkdirBase != "/tmp/ghr-runners" {
+		t.Errorf("runner.workdir_base = %q, want %q", cfg.Runner.WorkdirBase, "/tmp/ghr-runners")
+	}
+
+	// Groups.
+	if len(cfg.Groups) != 2 {
+		t.Fatalf("expected 2 groups, got %d", len(cfg.Groups))
+	}
+	prod := cfg.Groups[0]
+	if prod.Name != "production" {
+		t.Errorf("groups[0].name = %q, want %q", prod.Name, "production")
+	}
+	if prod.MaxRunners != 10 {
+		t.Errorf("groups[0].max_runners = %d, want 10", prod.MaxRunners)
+	}
+	if prod.MinRunners != 2 {
+		t.Errorf("groups[0].min_runners = %d, want 2", prod.MinRunners)
+	}
+	if len(prod.Labels) != 2 || prod.Labels[0] != "self-hosted" || prod.Labels[1] != "linux" {
+		t.Errorf("groups[0].labels = %v, want [self-hosted linux]", prod.Labels)
+	}
+	if prod.RunnerGroup != "prod-pool" {
+		t.Errorf("groups[0].runner_group = %q, want %q", prod.RunnerGroup, "prod-pool")
+	}
+	if prod.Version != "2.319.0" {
+		t.Errorf("groups[0].version = %q, want %q", prod.Version, "2.319.0")
+	}
+
+	staging := cfg.Groups[1]
+	if staging.Name != "staging" {
+		t.Errorf("groups[1].name = %q, want %q", staging.Name, "staging")
+	}
+	if staging.MaxRunners != 5 {
+		t.Errorf("groups[1].max_runners = %d, want 5", staging.MaxRunners)
+	}
+
+	// Health.
+	if !cfg.Health.Enabled {
+		t.Error("health.enabled = false, want true")
+	}
+	if cfg.Health.CheckInterval.Duration != 1*time.Minute {
+		t.Errorf("health.check_interval = %v, want 1m", cfg.Health.CheckInterval.Duration)
+	}
+	if cfg.Health.RunnerTimeout.Duration != 3*time.Hour {
+		t.Errorf("health.runner_timeout = %v, want 3h", cfg.Health.RunnerTimeout.Duration)
+	}
+	if cfg.Health.IdleTimeout.Duration != 30*time.Minute {
+		t.Errorf("health.idle_timeout = %v, want 30m", cfg.Health.IdleTimeout.Duration)
+	}
+	if cfg.Health.DivergenceTimeout.Duration != 10*time.Minute {
+		t.Errorf("health.divergence_timeout = %v, want 10m", cfg.Health.DivergenceTimeout.Duration)
+	}
+	if cfg.Health.MaxConsecutiveFailures != 10 {
+		t.Errorf("health.max_consecutive_failures = %d, want 10", cfg.Health.MaxConsecutiveFailures)
+	}
+	if cfg.Health.FailureCooldown.Duration != 2*time.Minute {
+		t.Errorf("health.failure_cooldown = %v, want 2m", cfg.Health.FailureCooldown.Duration)
+	}
+	if cfg.Health.MinDiskSpace != "5GB" {
+		t.Errorf("health.min_disk_space = %q, want %q", cfg.Health.MinDiskSpace, "5GB")
+	}
+
+	// Logging.
+	if cfg.Logging.Level != "debug" {
+		t.Errorf("logging.level = %q, want %q", cfg.Logging.Level, "debug")
+	}
+	if cfg.Logging.Format != "json" {
+		t.Errorf("logging.format = %q, want %q", cfg.Logging.Format, "json")
+	}
+	if cfg.Logging.Dir != "/tmp/ghr-logs" {
+		t.Errorf("logging.dir = %q, want %q", cfg.Logging.Dir, "/tmp/ghr-logs")
+	}
+	if cfg.Logging.RetentionDays != 14 {
+		t.Errorf("logging.retention_days = %d, want 14", cfg.Logging.RetentionDays)
+	}
+	// With *bool, runner_output: false in YAML is now respected.
+	if cfg.Logging.RunnerOutput == nil {
+		t.Error("logging.runner_output = nil, want false")
+	} else if *cfg.Logging.RunnerOutput {
+		t.Error("logging.runner_output = true, want false (explicitly set in YAML)")
+	}
+
+	// Notifications.
+	if !cfg.Notifications.Discord.Enabled {
+		t.Error("notifications.discord.enabled = false, want true")
+	}
+	if cfg.Notifications.Discord.Username != "my-bot" {
+		t.Errorf("notifications.discord.username = %q, want %q", cfg.Notifications.Discord.Username, "my-bot")
+	}
+	if len(cfg.Notifications.Discord.Events) != 2 {
+		t.Errorf("notifications.discord.events len = %d, want 2", len(cfg.Notifications.Discord.Events))
+	}
+	if cfg.Notifications.Discord.Mentions.Error != "<@&111>" {
+		t.Errorf("notifications.discord.mentions.error = %q, want %q", cfg.Notifications.Discord.Mentions.Error, "<@&111>")
+	}
+	if cfg.Notifications.Discord.Mentions.Critical != "<@&222>" {
+		t.Errorf("notifications.discord.mentions.critical = %q, want %q", cfg.Notifications.Discord.Mentions.Critical, "<@&222>")
+	}
+
+	// Monitoring.
+	if !cfg.Monitoring.UptimeKuma.Enabled {
+		t.Error("monitoring.uptime_kuma.enabled = false, want true")
+	}
+	if cfg.Monitoring.UptimeKuma.DegradedThreshold != 0.8 {
+		t.Errorf("monitoring.uptime_kuma.degraded_threshold = %f, want 0.8", cfg.Monitoring.UptimeKuma.DegradedThreshold)
+	}
+	if !cfg.Monitoring.UptimeKuma.ReportHealthAsPing {
+		t.Error("monitoring.uptime_kuma.report_health_as_ping = false, want true")
+	}
+
+	// Daemon.
+	if cfg.Daemon.StateDir != "/tmp/ghr-state" {
+		t.Errorf("daemon.state_dir = %q, want %q", cfg.Daemon.StateDir, "/tmp/ghr-state")
+	}
+	if cfg.Daemon.ShutdownTimeout.Duration != 1*time.Minute {
+		t.Errorf("daemon.shutdown_timeout = %v, want 1m", cfg.Daemon.ShutdownTimeout.Duration)
+	}
+}
+
+func TestLoad_ValidationErrors(t *testing.T) {
+	tests := []struct {
+		name      string
+		yaml      string
+		wantInErr string // substring expected in the error message
+	}{
+		{
+			name:      "no groups",
+			yaml:      `github: {url: "https://github.com"}`,
+			wantInErr: "at least one group is required",
+		},
+		{
+			name: "empty group name",
+			yaml: `
+groups:
+  - name: ""
+    max_runners: 1`,
+			wantInErr: "name is required",
+		},
+		{
+			name: "duplicate group names",
+			yaml: `
+groups:
+  - name: dup
+    max_runners: 1
+  - name: dup
+    max_runners: 2`,
+			wantInErr: "duplicate group name",
+		},
+		{
+			name: "max_runners less than 1",
+			yaml: `
+groups:
+  - name: grp
+    max_runners: 0`,
+			wantInErr: "max_runners must be >= 1",
+		},
+		{
+			name: "min_runners negative",
+			yaml: `
+groups:
+  - name: grp
+    max_runners: 5
+    min_runners: -1`,
+			wantInErr: "min_runners must be >= 0",
+		},
+		{
+			name: "min_runners greater than max_runners",
+			yaml: `
+groups:
+  - name: grp
+    max_runners: 2
+    min_runners: 5`,
+			wantInErr: "min_runners (5) must be <= max_runners (2)",
+		},
+		{
+			name: "empty label string",
+			yaml: `
+groups:
+  - name: grp
+    max_runners: 1
+    labels:
+      - ""`,
+			wantInErr: "labels[0] must not be empty",
+		},
+		{
+			name: "invalid logging level",
+			yaml: `
+logging:
+  level: "verbose"
+groups:
+  - name: grp
+    max_runners: 1`,
+			wantInErr: "logging.level must be one of",
+		},
+		{
+			name: "invalid logging format",
+			yaml: `
+logging:
+  format: "xml"
+groups:
+  - name: grp
+    max_runners: 1`,
+			wantInErr: "logging.format must be one of",
+		},
+		{
+			name: "check_interval too small",
+			yaml: `
+health:
+  check_interval: "2s"
+groups:
+  - name: grp
+    max_runners: 1`,
+			wantInErr: "health.check_interval must be >= 5s",
+		},
+		{
+			name: "runner_timeout too small",
+			yaml: `
+health:
+  runner_timeout: "30s"
+groups:
+  - name: grp
+    max_runners: 1`,
+			wantInErr: "health.runner_timeout must be >= 1m",
+		},
+		{
+			name: "shutdown_timeout too small",
+			yaml: `
+daemon:
+  shutdown_timeout: "2s"
+groups:
+  - name: grp
+    max_runners: 1`,
+			wantInErr: "daemon.shutdown_timeout must be >= 5s",
+		},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			_, err := Load(writeConfig(t, tt.yaml))
+			if err == nil {
+				t.Fatal("Load() expected error, got nil")
+			}
+			if !strings.Contains(err.Error(), tt.wantInErr) {
+				t.Errorf("error = %q, want substring %q", err.Error(), tt.wantInErr)
+			}
+		})
+	}
+}
+
+func TestLoad_DefaultPaths_NonRoot(t *testing.T) {
+	// This test runs as a non-root user in development.
+	if os.Getuid() == 0 {
+		t.Skip("test requires non-root user")
+	}
+
+	yaml := `
+groups:
+  - name: grp
+    max_runners: 1
+`
+	cfg, err := Load(writeConfig(t, yaml))
+	if err != nil {
+		t.Fatalf("Load() unexpected error: %v", err)
+	}
+
+	home, err := os.UserHomeDir()
+	if err != nil {
+		t.Fatalf("UserHomeDir() error: %v", err)
+	}
+
+	expectedDataDir := filepath.Join(home, ".local", "share", "ghr")
+
+	if !strings.HasPrefix(cfg.Runner.CacheDir, expectedDataDir) {
+		t.Errorf("runner.cache_dir = %q, want prefix %q", cfg.Runner.CacheDir, expectedDataDir)
+	}
+	if !strings.HasPrefix(cfg.Runner.WorkdirBase, expectedDataDir) {
+		t.Errorf("runner.workdir_base = %q, want prefix %q", cfg.Runner.WorkdirBase, expectedDataDir)
+	}
+	if !strings.HasPrefix(cfg.Logging.Dir, expectedDataDir) {
+		t.Errorf("logging.dir = %q, want prefix %q", cfg.Logging.Dir, expectedDataDir)
+	}
+
+	expectedStateDir := filepath.Join(home, ".local", "state", "ghr")
+	if !strings.HasPrefix(cfg.Daemon.StateDir, expectedStateDir) {
+		t.Errorf("daemon.state_dir = %q, want prefix %q", cfg.Daemon.StateDir, expectedStateDir)
+	}
+}
+
+func TestLoad_FileNotFound(t *testing.T) {
+	_, err := Load("/nonexistent/path/config.yaml")
+	if err == nil {
+		t.Fatal("Load() expected error for non-existent file, got nil")
+	}
+	if !strings.Contains(err.Error(), "read config file") {
+		t.Errorf("error = %q, want substring %q", err.Error(), "read config file")
+	}
+}
+
+func TestLoad_InvalidYAML(t *testing.T) {
+	invalidYAML := `
+groups:
+  - name: test
+    max_runners: [[[invalid
+`
+	_, err := Load(writeConfig(t, invalidYAML))
+	if err == nil {
+		t.Fatal("Load() expected error for invalid YAML, got nil")
+	}
+	if !strings.Contains(err.Error(), "parse config file") {
+		t.Errorf("error = %q, want substring %q", err.Error(), "parse config file")
+	}
+}
+
+func TestLoad_DurationParsing(t *testing.T) {
+	yaml := `
+health:
+  check_interval: "30s"
+  runner_timeout: "5m"
+  idle_timeout: "2h"
+  divergence_timeout: "10m"
+  failure_cooldown: "90s"
+
+daemon:
+  shutdown_timeout: "1m30s"
+
+groups:
+  - name: grp
+    max_runners: 1
+`
+	cfg, err := Load(writeConfig(t, yaml))
+	if err != nil {
+		t.Fatalf("Load() unexpected error: %v", err)
+	}
+
+	tests := []struct {
+		name string
+		got  time.Duration
+		want time.Duration
+	}{
+		{"check_interval 30s", cfg.Health.CheckInterval.Duration, 30 * time.Second},
+		{"runner_timeout 5m", cfg.Health.RunnerTimeout.Duration, 5 * time.Minute},
+		{"idle_timeout 2h", cfg.Health.IdleTimeout.Duration, 2 * time.Hour},
+		{"divergence_timeout 10m", cfg.Health.DivergenceTimeout.Duration, 10 * time.Minute},
+		{"failure_cooldown 90s", cfg.Health.FailureCooldown.Duration, 90 * time.Second},
+		{"shutdown_timeout 1m30s", cfg.Daemon.ShutdownTimeout.Duration, time.Minute + 30*time.Second},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			if tt.got != tt.want {
+				t.Errorf("duration = %v, want %v", tt.got, tt.want)
+			}
+		})
+	}
+}
+
+func TestLoad_EnvVarResolution(t *testing.T) {
+	t.Setenv("GHR_DISCORD_WEBHOOK_URL", "https://discord.com/api/webhooks/test")
+	t.Setenv("GHR_UPTIME_KUMA_URL", "https://uptime.example.com/api/push/abc123")
+
+	yaml := `
+groups:
+  - name: grp
+    max_runners: 1
+`
+	cfg, err := Load(writeConfig(t, yaml))
+	if err != nil {
+		t.Fatalf("Load() unexpected error: %v", err)
+	}
+
+	if cfg.Notifications.Discord.WebhookURL != "https://discord.com/api/webhooks/test" {
+		t.Errorf("discord.webhook_url = %q, want %q", cfg.Notifications.Discord.WebhookURL, "https://discord.com/api/webhooks/test")
+	}
+	if cfg.Monitoring.UptimeKuma.BaseURL != "https://uptime.example.com/api/push/abc123" {
+		t.Errorf("uptime_kuma.base_url = %q, want %q", cfg.Monitoring.UptimeKuma.BaseURL, "https://uptime.example.com/api/push/abc123")
+	}
+}
diff --git a/internal/config/types.go b/internal/config/types.go
new file mode 100644
index 0000000..5cb4f24
--- /dev/null
+++ b/internal/config/types.go
@@ -0,0 +1,120 @@
+package config
+
+import (
+	"fmt"
+	"time"
+)
+
+type Config struct {
+	GitHub        GitHubConfig        `yaml:"github"`
+	Runner        RunnerConfig        `yaml:"runner"`
+	Groups        []GroupConfig       `yaml:"groups"`
+	Health        HealthConfig        `yaml:"health"`
+	Logging       LoggingConfig       `yaml:"logging"`
+	Notifications NotificationsConfig `yaml:"notifications"`
+	Monitoring    MonitoringConfig    `yaml:"monitoring"`
+	Daemon        DaemonConfig        `yaml:"daemon"`
+}
+
+type GitHubConfig struct {
+	URL         string `yaml:"url"`
+	RunnerGroup string `yaml:"runner_group"`
+}
+
+type RunnerConfig struct {
+	Version     string `yaml:"version"`
+	CacheDir    string `yaml:"cache_dir"`
+	WorkdirBase string `yaml:"workdir_base"`
+}
+
+type GroupConfig struct {
+	Name        string             `yaml:"name"`
+	MaxRunners  int                `yaml:"max_runners"`
+	MinRunners  int                `yaml:"min_runners"`
+	Labels      []string           `yaml:"labels"`
+	RunnerGroup string             `yaml:"runner_group"`
+	Version     string             `yaml:"version"`
+	Health      *GroupHealthConfig `yaml:"health,omitempty"`
+}
+
+type GroupHealthConfig struct {
+	RunnerTimeout Duration `yaml:"runner_timeout"`
+}
+
+type HealthConfig struct {
+	Enabled                bool     `yaml:"enabled"`
+	CheckInterval          Duration `yaml:"check_interval"`
+	RunnerTimeout          Duration `yaml:"runner_timeout"`
+	IdleTimeout            Duration `yaml:"idle_timeout"`
+	DivergenceTimeout      Duration `yaml:"divergence_timeout"`
+	MaxConsecutiveFailures int      `yaml:"max_consecutive_failures"`
+	FailureCooldown        Duration `yaml:"failure_cooldown"`
+	MinDiskSpace           string   `yaml:"min_disk_space"`
+}
+
+type LoggingConfig struct {
+	Level         string `yaml:"level"`
+	Format        string `yaml:"format"`
+	Dir           string `yaml:"dir"`
+	RetentionDays int    `yaml:"retention_days"`
+	RunnerOutput  *bool  `yaml:"runner_output"`
+}
+
+type NotificationsConfig struct {
+	Discord DiscordConfig `yaml:"discord"`
+}
+
+type DiscordConfig struct {
+	Enabled    bool     `yaml:"enabled"`
+	WebhookURL string   `yaml:"-"`
+	Events     []string `yaml:"events"`
+	Username   string   `yaml:"username"`
+	AvatarURL  string   `yaml:"avatar_url"`
+	Mentions   struct {
+		Error    string `yaml:"error"`
+		Critical string `yaml:"critical"`
+	} `yaml:"mentions"`
+}
+
+type MonitoringConfig struct {
+	UptimeKuma UptimeKumaConfig `yaml:"uptime_kuma"`
+}
+
+type UptimeKumaConfig struct {
+	Enabled            bool              `yaml:"enabled"`
+	BaseURL            string            `yaml:"-"`
+	DaemonToken        string            `yaml:"-"`
+	GroupTokens        map[string]string `yaml:"-"`
+	DegradedThreshold  float64           `yaml:"degraded_threshold"`
+	ReportHealthAsPing bool              `yaml:"report_health_as_ping"`
+}
+
+type DaemonConfig struct {
+	StateDir        string   `yaml:"state_dir"`
+	ShutdownTimeout Duration `yaml:"shutdown_timeout"`
+}
+
+type Duration struct {
+	time.Duration
+}
+
+func (d *Duration) UnmarshalYAML(unmarshal func(interface{}) error) error {
+	var s string
+	if err := unmarshal(&s); err != nil {
+		return fmt.Errorf("unmarshaling duration: %w", err)
+	}
+	if s == "" || s == "0" {
+		d.Duration = 0
+		return nil
+	}
+	dur, err := time.ParseDuration(s)
+	if err != nil {
+		return fmt.Errorf("invalid duration %q: %w", s, err)
+	}
+	d.Duration = dur
+	return nil
+}
+
+func (d Duration) MarshalYAML() (interface{}, error) {
+	return d.String(), nil
+}
diff --git a/internal/config/validate.go b/internal/config/validate.go
new file mode 100644
index 0000000..598ca32
--- /dev/null
+++ b/internal/config/validate.go
@@ -0,0 +1,92 @@
+package config
+
+import (
+	"errors"
+	"fmt"
+	"time"
+)
+
+func validate(cfg *Config) error {
+	var errs []error
+
+	if len(cfg.Groups) == 0 {
+		errs = append(errs, errors.New("at least one group is required"))
+	}
+
+	seenNames := make(map[string]bool, len(cfg.Groups))
+
+	for i, g := range cfg.Groups {
+		prefix := fmt.Sprintf("groups[%d]", i)
+
+		switch {
+		case g.Name == "":
+			errs = append(errs, fmt.Errorf("%s: name is required", prefix))
+		case seenNames[g.Name]:
+			errs = append(errs, fmt.Errorf("%s: duplicate group name %q", prefix, g.Name))
+		default:
+			seenNames[g.Name] = true
+		}
+
+		if g.MaxRunners < 1 {
+			errs = append(errs, fmt.Errorf("%s (%s): max_runners must be >= 1", prefix, g.Name))
+		}
+
+		if g.MinRunners < 0 {
+			errs = append(errs, fmt.Errorf("%s (%s): min_runners must be >= 0", prefix, g.Name))
+		}
+
+		if g.MinRunners > g.MaxRunners {
+			errs = append(errs, fmt.Errorf("%s (%s): min_runners (%d) must be <= max_runners (%d)", prefix, g.Name, g.MinRunners, g.MaxRunners))
+		}
+
+		for j, label := range g.Labels {
+			if label == "" {
+				errs = append(errs, fmt.Errorf("%s (%s): labels[%d] must not be empty", prefix, g.Name, j))
+			}
+		}
+	}
+
+	if cfg.Health.CheckInterval.Duration > 0 && cfg.Health.CheckInterval.Duration < 5*time.Second {
+		errs = append(errs, fmt.Errorf("health.check_interval must be >= 5s, got %s", cfg.Health.CheckInterval.Duration))
+	}
+	if cfg.Health.RunnerTimeout.Duration > 0 && cfg.Health.RunnerTimeout.Duration < 1*time.Minute {
+		errs = append(errs, fmt.Errorf("health.runner_timeout must be >= 1m, got %s", cfg.Health.RunnerTimeout.Duration))
+	}
+	if cfg.Daemon.ShutdownTimeout.Duration > 0 && cfg.Daemon.ShutdownTimeout.Duration < 5*time.Second {
+		errs = append(errs, fmt.Errorf("daemon.shutdown_timeout must be >= 5s, got %s", cfg.Daemon.ShutdownTimeout.Duration))
+	}
+
+	if cfg.Health.MinDiskSpace != "" {
+		if _, parseErr := ParseByteSize(cfg.Health.MinDiskSpace); parseErr != nil {
+			errs = append(errs, fmt.Errorf("health.min_disk_space: %w", parseErr))
+		}
+	}
+
+	switch cfg.Logging.Level {
+	case "debug", "info", "warn", "error":
+	default:
+		errs = append(errs, fmt.Errorf("logging.level must be one of: debug, info, warn, error; got %q", cfg.Logging.Level))
+	}
+
+	switch cfg.Logging.Format {
+	case "text", "json":
+	default:
+		errs = append(errs, fmt.Errorf("logging.format must be one of: text, json; got %q", cfg.Logging.Format))
+	}
+
+	if len(errs) > 0 {
+		return errors.Join(errs...)
+	}
+	return nil
+}
+
+func isHealthZero(h HealthConfig) bool {
+	return !h.Enabled &&
+		h.CheckInterval.Duration == 0 &&
+		h.RunnerTimeout.Duration == 0 &&
+		h.IdleTimeout.Duration == 0 &&
+		h.DivergenceTimeout.Duration == 0 &&
+		h.MaxConsecutiveFailures == 0 &&
+		h.FailureCooldown.Duration == 0 &&
+		h.MinDiskSpace == ""
+}
diff --git a/internal/controller/controller.go b/internal/controller/controller.go
new file mode 100644
index 0000000..100da0d
--- /dev/null
+++ b/internal/controller/controller.go
@@ -0,0 +1,149 @@
+package controller
+
+import (
+	"context"
+	"fmt"
+	"log/slog"
+	"sync"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/config"
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/logging"
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/model"
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/runner"
+	"github.com/actions/scaleset"
+	"github.com/actions/scaleset/listener"
+)
+
+type scaleSetClient interface {
+	GetScaleSet(ctx context.Context, runnerGroupID int, name string) (*scaleset.RunnerScaleSet, error)
+	CreateScaleSet(ctx context.Context, name string, runnerGroupID int, labels []string) (*scaleset.RunnerScaleSet, error)
+	DeleteScaleSet(ctx context.Context, id int) error
+	GenerateJITConfig(ctx context.Context, scaleSetID int, runnerName string) (string, error)
+	OpenSession(ctx context.Context, scaleSetID int, owner string) (*scaleset.MessageSessionClient, error)
+	NewListener(session *scaleset.MessageSessionClient, scaleSetID int, maxRunners int) (*listener.Listener, error)
+}
+
+type notifier interface {
+	Notify(ctx context.Context, event *model.Event)
+}
+
+type ControllerConfig struct {
+	RunnerVersion string
+	RunnerGroupID int
+}
+
+type GroupController struct {
+	client    scaleSetClient
+	binary    *runner.BinaryManager
+	process   *runner.ProcessManager
+	notifier  notifier
+	logMgr    *logging.LogManager
+	groups    []config.GroupConfig
+	globalCfg ControllerConfig
+	logger    *slog.Logger
+
+	mu      sync.Mutex
+	scalers map[string]*MacOSScaler
+}
+
+func New(
+	client scaleSetClient,
+	binary *runner.BinaryManager,
+	process *runner.ProcessManager,
+	notifier notifier,
+	logMgr *logging.LogManager,
+	groups []config.GroupConfig,
+	globalCfg ControllerConfig,
+	logger *slog.Logger,
+) *GroupController {
+	return &GroupController{
+		client:    client,
+		binary:    binary,
+		process:   process,
+		notifier:  notifier,
+		logMgr:    logMgr,
+		groups:    groups,
+		globalCfg: globalCfg,
+		logger:    logger,
+		scalers:   make(map[string]*MacOSScaler),
+	}
+}
+
+func (c *GroupController) Run(ctx context.Context) error {
+	var wg sync.WaitGroup
+	errCh := make(chan error, len(c.groups))
+
+	for _, g := range c.groups {
+		wg.Add(1)
+		go func(group *config.GroupConfig) {
+			defer wg.Done()
+			if err := c.runGroup(ctx, group); err != nil {
+				errCh <- err
+			}
+		}(&g)
+	}
+
+	<-ctx.Done()
+	wg.Wait()
+	close(errCh)
+
+	for err := range errCh {
+		if err != nil {
+			return err
+		}
+	}
+	return nil
+}
+
+func (c *GroupController) Shutdown(ctx context.Context) {
+	c.mu.Lock()
+	scalers := make(map[string]*MacOSScaler, len(c.scalers))
+	for k, v := range c.scalers {
+		scalers[k] = v
+	}
+	c.mu.Unlock()
+
+	for name, s := range scalers {
+		c.logger.InfoContext(ctx, "shutting down scaler", "group", name)
+		s.Shutdown(ctx)
+	}
+}
+
+func (c *GroupController) Snapshots() map[string][]model.RunnerSnapshot {
+	c.mu.Lock()
+	scalers := make(map[string]*MacOSScaler, len(c.scalers))
+	for k, v := range c.scalers {
+		scalers[k] = v
+	}
+	c.mu.Unlock()
+
+	result := make(map[string][]model.RunnerSnapshot, len(scalers))
+	for name, s := range scalers {
+		result[name] = s.Snapshots()
+	}
+	return result
+}
+
+func (c *GroupController) KillRunner(ctx context.Context, group, runnerName string) error {
+	c.mu.Lock()
+	s, ok := c.scalers[group]
+	c.mu.Unlock()
+
+	if !ok {
+		return fmt.Errorf("kill runner %s: group %q not found", runnerName, group)
+	}
+
+	return s.killRunner(ctx, runnerName)
+}
+
+func (c *GroupController) registerScaler(name string, s *MacOSScaler) {
+	c.mu.Lock()
+	defer c.mu.Unlock()
+	c.scalers[name] = s
+}
+
+func (c *GroupController) unregisterScaler(name string) {
+	c.mu.Lock()
+	defer c.mu.Unlock()
+	delete(c.scalers, name)
+}
diff --git a/internal/controller/group.go b/internal/controller/group.go
new file mode 100644
index 0000000..4ba23ea
--- /dev/null
+++ b/internal/controller/group.go
@@ -0,0 +1,191 @@
+package controller
+
+import (
+	"context"
+	"errors"
+	"fmt"
+	"log/slog"
+	"os"
+	"time"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/config"
+	"github.com/actions/scaleset"
+)
+
+const (
+	backoffMin = 2 * time.Second
+	backoffMax = 30 * time.Second
+)
+
+func (c *GroupController) runGroup(ctx context.Context, group *config.GroupConfig) error {
+	version := group.Version
+	if version == "" {
+		version = c.globalCfg.RunnerVersion
+	}
+
+	cachedDir, err := c.binary.EnsureBits(ctx, version)
+	if err != nil {
+		return fmt.Errorf("ensure runner bits for group %q: %w", group.Name, err)
+	}
+
+	groupLogger, err := c.logMgr.GroupLogger(group.Name)
+	if err != nil {
+		return fmt.Errorf("create group logger for %q: %w", group.Name, err)
+	}
+
+	labels := deduplicateLabels(group.Name, group.Labels)
+
+	backoff := backoffMin
+	for {
+		err := c.runGroupOnce(ctx, group, cachedDir, labels, groupLogger)
+		if err == nil || errors.Is(err, context.Canceled) {
+			return nil
+		}
+
+		groupLogger.ErrorContext(ctx, "group listener failed, retrying",
+			"group", group.Name,
+			"error", err,
+			"backoff", backoff,
+		)
+
+		select {
+		case <-ctx.Done():
+			return nil
+		case <-time.After(backoff):
+		}
+
+		backoff = nextBackoff(backoff)
+	}
+}
+
+func (c *GroupController) runGroupOnce(
+	ctx context.Context,
+	group *config.GroupConfig,
+	cachedDir string,
+	labels []string,
+	groupLogger *slog.Logger,
+) error {
+	ss, err := c.resolveScaleSet(ctx, group.Name, labels)
+	if err != nil {
+		return fmt.Errorf("resolve scale set %q: %w", group.Name, err)
+	}
+
+	hostname, err := os.Hostname()
+	if err != nil {
+		hostname = "unknown"
+	}
+
+	session, err := c.client.OpenSession(ctx, ss.ID, hostname)
+	if err != nil {
+		return fmt.Errorf("open session for %q: %w", group.Name, err)
+	}
+	defer func() {
+		closeCtx := context.WithoutCancel(ctx)
+		if closeErr := session.Close(closeCtx); closeErr != nil {
+			groupLogger.DebugContext(ctx, "session close",
+				"group", group.Name,
+				"error", closeErr,
+			)
+		}
+	}()
+
+	scaler := NewMacOSScaler(
+		c.client, c.process, c.logMgr, c.notifier,
+		ss.ID, group.Name, group.MaxRunners, group.MinRunners,
+		cachedDir, groupLogger,
+	)
+	c.registerScaler(group.Name, scaler)
+
+	l, err := c.client.NewListener(session, ss.ID, group.MaxRunners)
+	if err != nil {
+		c.unregisterScaler(group.Name)
+		return fmt.Errorf("create listener for %q: %w", group.Name, err)
+	}
+
+	groupLogger.InfoContext(ctx, "group listener started",
+		"group", group.Name,
+		"scale_set_id", ss.ID,
+	)
+
+	listenerErr := l.Run(ctx, scaler)
+
+	c.unregisterScaler(group.Name)
+
+	if errors.Is(listenerErr, context.Canceled) {
+		scaler.Shutdown(ctx)
+		cleanupCtx := context.WithoutCancel(ctx)
+		deleteErr := c.client.DeleteScaleSet(cleanupCtx, ss.ID)
+		if deleteErr != nil {
+			groupLogger.WarnContext(ctx, "failed to delete scale set on shutdown",
+				"group", group.Name,
+				"scale_set_id", ss.ID,
+				"error", deleteErr,
+			)
+		}
+		return context.Canceled
+	}
+
+	return listenerErr
+}
+
+func (c *GroupController) resolveScaleSet(ctx context.Context, name string, labels []string) (*resolvedScaleSet, error) {
+	ss, err := c.client.GetScaleSet(ctx, c.globalCfg.RunnerGroupID, name)
+	if err == nil && ss != nil {
+		if labelsChanged(ss.Labels, labels) {
+			c.logger.WarnContext(ctx, "scale set label mismatch detected, delete and recreate to update",
+				"group", name,
+				"scale_set_id", ss.ID,
+			)
+		}
+		return &resolvedScaleSet{ID: ss.ID, Name: ss.Name}, nil
+	}
+
+	ss, err = c.client.CreateScaleSet(ctx, name, c.globalCfg.RunnerGroupID, labels)
+	if err != nil {
+		return nil, fmt.Errorf("create scale set %q: %w", name, err)
+	}
+	return &resolvedScaleSet{ID: ss.ID, Name: ss.Name}, nil
+}
+
+func labelsChanged(existing []scaleset.Label, desired []string) bool {
+	if len(existing) != len(desired) {
+		return true
+	}
+	have := make(map[string]struct{}, len(existing))
+	for _, l := range existing {
+		have[l.Name] = struct{}{}
+	}
+	for _, d := range desired {
+		if _, ok := have[d]; !ok {
+			return true
+		}
+	}
+	return false
+}
+
+type resolvedScaleSet struct {
+	ID   int
+	Name string
+}
+
+func deduplicateLabels(groupName string, extra []string) []string {
+	seen := make(map[string]struct{}, len(extra)+1)
+	result := make([]string, 0, len(extra)+1)
+
+	for _, label := range append([]string{groupName}, extra...) {
+		if _, ok := seen[label]; ok {
+			continue
+		}
+		seen[label] = struct{}{}
+		result = append(result, label)
+	}
+	return result
+}
+
+func nextBackoff(current time.Duration) time.Duration {
+	next := current * 2
+	if next > backoffMax {
+		return backoffMax
+	}
+	return next
+}
diff --git a/internal/controller/kill_runner_test.go b/internal/controller/kill_runner_test.go
new file mode 100644
index 0000000..822979f
--- /dev/null
+++ b/internal/controller/kill_runner_test.go
@@ -0,0 +1,45 @@
+package controller
+
+import (
+	"context"
+	"log/slog"
+	"os"
+	"testing"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/runner"
+)
+
+func testLogger() *slog.Logger {
+	return slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{Level: slog.LevelError + 1}))
+}
+
+func TestKillRunner_GroupNotFound(t *testing.T) {
+	c := &GroupController{
+		scalers: make(map[string]*MacOSScaler),
+		logger:  testLogger(),
+	}
+
+	err := c.KillRunner(context.Background(), "missing-group", "r1")
+	if err == nil {
+		t.Fatal("expected error for missing group")
+	}
+}
+
+func TestKillRunner_RunnerNotFound(t *testing.T) {
+	scaler := &MacOSScaler{
+		groupName: "group-a",
+		idle:      make(map[string]*runner.Process),
+		busy:      make(map[string]*runner.Process),
+		logger:    testLogger(),
+	}
+
+	c := &GroupController{
+		scalers: map[string]*MacOSScaler{"group-a": scaler},
+		logger:  testLogger(),
+	}
+
+	err := c.KillRunner(context.Background(), "group-a", "r-nonexistent")
+	if err == nil {
+		t.Fatal("expected error for missing runner")
+	}
+}
diff --git a/internal/controller/scaler.go b/internal/controller/scaler.go
new file mode 100644
index 0000000..fe3e45a
--- /dev/null
+++ b/internal/controller/scaler.go
@@ -0,0 +1,187 @@
+package controller
+
+import (
+	"context"
+	"fmt"
+	"log/slog"
+	"sync"
+	"time"
+
+	"io"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/logging"
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/model"
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/runner"
+	"github.com/actions/scaleset"
+)
+
+type runnerStarter interface {
+	Prepare(ctx context.Context, instance *model.RunnerInstance, cachedDir string) (string, error)
+	Start(ctx context.Context, instance *model.RunnerInstance, workdir, jitConfig string, logFile io.Writer) (*runner.Process, error)
+	Stop(ctx context.Context, proc *runner.Process) error
+	Cleanup(proc *runner.Process) error
+}
+
+type MacOSScaler struct {
+	client     scaleSetClient
+	process    runnerStarter
+	logMgr     *logging.LogManager
+	notifier   notifier
+	scaleSetID int
+	groupName  string
+	maxRunners int
+	minRunners int
+	cachedDir  string
+	logger     *slog.Logger
+
+	mu   sync.Mutex
+	idle map[string]*runner.Process
+	busy map[string]*runner.Process
+}
+
+func NewMacOSScaler(
+	client scaleSetClient,
+	process runnerStarter,
+	logMgr *logging.LogManager,
+	notifier notifier,
+	scaleSetID int,
+	groupName string,
+	maxRunners int,
+	minRunners int,
+	cachedDir string,
+	logger *slog.Logger,
+) *MacOSScaler {
+	return &MacOSScaler{
+		client:     client,
+		process:    process,
+		logMgr:     logMgr,
+		notifier:   notifier,
+		scaleSetID: scaleSetID,
+		groupName:  groupName,
+		maxRunners: maxRunners,
+		minRunners: minRunners,
+		cachedDir:  cachedDir,
+		logger:     logger,
+		idle:       make(map[string]*runner.Process),
+		busy:       make(map[string]*runner.Process),
+	}
+}
+
+func (s *MacOSScaler) HandleDesiredRunnerCount(ctx context.Context, count int) (int, error) {
+	s.mu.Lock()
+	defer s.mu.Unlock()
+
+	target := s.minRunners + count
+	if target > s.maxRunners {
+		target = s.maxRunners
+	}
+
+	current := len(s.idle) + len(s.busy)
+	for i := 0; i < target-current; i++ {
+		if err := s.startRunner(ctx); err != nil {
+			s.logger.ErrorContext(ctx, "failed to start runner",
+				"group", s.groupName,
+				"error", err,
+			)
+		}
+	}
+
+	return len(s.idle) + len(s.busy), nil
+}
+
+func (s *MacOSScaler) HandleJobStarted(ctx context.Context, jobInfo *scaleset.JobStarted) error {
+	s.mu.Lock()
+	defer s.mu.Unlock()
+
+	proc, ok := s.idle[jobInfo.RunnerName]
+	if !ok {
+		s.logger.WarnContext(ctx, "job started for unknown runner",
+			"runner", jobInfo.RunnerName,
+			"group", s.groupName,
+		)
+		return nil
+	}
+
+	delete(s.idle, jobInfo.RunnerName)
+	s.busy[jobInfo.RunnerName] = proc
+
+	s.logger.InfoContext(ctx, "job started",
+		"runner", jobInfo.RunnerName,
+		"group", s.groupName,
+		"job", jobInfo.JobDisplayName,
+	)
+
+	s.notifier.Notify(ctx, &model.Event{
+		Type:      model.EventRunnerStarted,
+		Level:     model.LevelInfo,
+		Group:     s.groupName,
+		Runner:    jobInfo.RunnerName,
+		Message:   fmt.Sprintf("Job started: %s", jobInfo.JobDisplayName),
+		Timestamp: time.Now(),
+	})
+
+	return nil
+}
+
+func (s *MacOSScaler) HandleJobCompleted(ctx context.Context, jobInfo *scaleset.JobCompleted) error {
+	s.mu.Lock()
+	proc := s.busy[jobInfo.RunnerName]
+	if proc == nil {
+		proc = s.idle[jobInfo.RunnerName]
+	}
+	delete(s.busy, jobInfo.RunnerName)
+	delete(s.idle, jobInfo.RunnerName)
+	s.mu.Unlock()
+
+	if proc != nil {
+		stopErr := s.process.Stop(ctx, proc)
+		if stopErr != nil {
+			s.logger.WarnContext(ctx, "failed to stop runner",
+				"runner", jobInfo.RunnerName,
+				"error", stopErr,
+			)
+		}
+		cleanupErr := s.process.Cleanup(proc)
+		if cleanupErr != nil {
+			s.logger.WarnContext(ctx, "failed to cleanup runner",
+				"runner", jobInfo.RunnerName,
+				"error", cleanupErr,
+			)
+		}
+	} else {
+		s.logger.WarnContext(ctx, "job completed for unknown runner",
+			"runner", jobInfo.RunnerName,
+			"group", s.groupName,
+		)
+	}
+
+	eventType := model.EventRunnerCompleted
+	if jobInfo.Result != "succeeded" {
+		eventType = model.EventRunnerFailed
+	}
+
+	logArgs := []any{
+		"runner", jobInfo.RunnerName,
+		"group", s.groupName,
+		"result", jobInfo.Result,
+	}
+	if !jobInfo.FinishTime.IsZero() && !jobInfo.RunnerAssignTime.IsZero() {
+		logArgs = append(logArgs, "duration", jobInfo.FinishTime.Sub(jobInfo.RunnerAssignTime).String())
+	}
+	if !jobInfo.QueueTime.IsZero() && !jobInfo.RunnerAssignTime.IsZero() {
+		logArgs = append(logArgs, "queue_wait", jobInfo.RunnerAssignTime.Sub(jobInfo.QueueTime).String())
+	}
+
+	s.logger.InfoContext(ctx, "job completed", logArgs...)
+
+	s.notifier.Notify(ctx, &model.Event{
+		Type:      eventType,
+		Level:     model.LevelInfo,
+		Group:     s.groupName,
+		Runner:    jobInfo.RunnerName,
+		Message:   fmt.Sprintf("Job completed: %s", jobInfo.Result),
+		Timestamp: time.Now(),
+	})
+
+	return nil
+}
diff --git a/internal/controller/scaler_ops.go b/internal/controller/scaler_ops.go
new file mode 100644
index 0000000..cee674c
--- /dev/null
+++ b/internal/controller/scaler_ops.go
@@ -0,0 +1,144 @@
+package controller
+
+import (
+	"context"
+	"crypto/rand"
+	"encoding/hex"
+	"fmt"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/model"
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/runner"
+)
+
+func (s *MacOSScaler) startRunner(ctx context.Context) error {
+	randBytes := make([]byte, 4)
+	if _, err := rand.Read(randBytes); err != nil {
+		return fmt.Errorf("generate runner ID: %w", err)
+	}
+	id := hex.EncodeToString(randBytes)
+	name := fmt.Sprintf("%s-%s", s.groupName, id)
+
+	jitConfig, err := s.client.GenerateJITConfig(ctx, s.scaleSetID, name)
+	if err != nil {
+		return fmt.Errorf("generate JIT config for %q: %w", name, err)
+	}
+
+	instance := model.RunnerInstance{
+		ID:    id,
+		Name:  name,
+		Group: s.groupName,
+	}
+
+	workdir, err := s.process.Prepare(ctx, &instance, s.cachedDir)
+	if err != nil {
+		return fmt.Errorf("prepare runner %q: %w", name, err)
+	}
+
+	logFile, err := s.logMgr.RunnerOutputFile(s.groupName, name)
+	if err != nil {
+		return fmt.Errorf("open runner log for %q: %w", name, err)
+	}
+
+	proc, err := s.process.Start(ctx, &instance, workdir, jitConfig, logFile)
+	if err != nil {
+		return fmt.Errorf("start runner %q: %w", name, err)
+	}
+
+	s.idle[name] = proc
+
+	s.logger.InfoContext(ctx, "runner provisioned",
+		"runner", name,
+		"group", s.groupName,
+		"pid", proc.PID,
+	)
+
+	return nil
+}
+
+func (s *MacOSScaler) killRunner(ctx context.Context, runnerName string) error {
+	s.mu.Lock()
+	proc := s.idle[runnerName]
+	if proc == nil {
+		proc = s.busy[runnerName]
+	}
+	delete(s.idle, runnerName)
+	delete(s.busy, runnerName)
+	s.mu.Unlock()
+
+	if proc == nil {
+		return fmt.Errorf("runner %q not found in group %q", runnerName, s.groupName)
+	}
+
+	stopErr := s.process.Stop(ctx, proc)
+	if stopErr != nil {
+		s.logger.WarnContext(ctx, "failed to stop runner during kill",
+			"runner", runnerName,
+			"error", stopErr,
+		)
+	}
+
+	cleanupErr := s.process.Cleanup(proc)
+	if cleanupErr != nil {
+		return fmt.Errorf("cleanup runner %q: %w", runnerName, cleanupErr)
+	}
+
+	s.logger.InfoContext(ctx, "killed runner", "runner", runnerName, "group", s.groupName)
+	return nil
+}
+
+func (s *MacOSScaler) Shutdown(ctx context.Context) {
+	s.mu.Lock()
+	allProcs := make([]*runner.Process, 0, len(s.idle)+len(s.busy))
+	for _, p := range s.idle {
+		allProcs = append(allProcs, p)
+	}
+	for _, p := range s.busy {
+		allProcs = append(allProcs, p)
+	}
+	s.idle = make(map[string]*runner.Process)
+	s.busy = make(map[string]*runner.Process)
+	s.mu.Unlock()
+
+	for _, proc := range allProcs {
+		stopErr := s.process.Stop(ctx, proc)
+		if stopErr != nil {
+			s.logger.WarnContext(ctx, "failed to stop runner during shutdown",
+				"runner", proc.Name,
+				"error", stopErr,
+			)
+		}
+		cleanupErr := s.process.Cleanup(proc)
+		if cleanupErr != nil {
+			s.logger.WarnContext(ctx, "failed to cleanup runner during shutdown",
+				"runner", proc.Name,
+				"error", cleanupErr,
+			)
+		}
+	}
+}
+
+func (s *MacOSScaler) Snapshots() []model.RunnerSnapshot {
+	s.mu.Lock()
+	defer s.mu.Unlock()
+
+	snapshots := make([]model.RunnerSnapshot, 0, len(s.idle)+len(s.busy))
+	for name, proc := range s.idle {
+		snapshots = append(snapshots, model.RunnerSnapshot{
+			Name:      name,
+			Group:     s.groupName,
+			State:     "idle",
+			PID:       proc.PID,
+			StartedAt: proc.StartedAt,
+		})
+	}
+	for name, proc := range s.busy {
+		snapshots = append(snapshots, model.RunnerSnapshot{
+			Name:      name,
+			Group:     s.groupName,
+			State:     "busy",
+			PID:       proc.PID,
+			StartedAt: proc.StartedAt,
+		})
+	}
+	return snapshots
+}
diff --git a/internal/controller/scaler_test.go b/internal/controller/scaler_test.go
new file mode 100644
index 0000000..a725a39
--- /dev/null
+++ b/internal/controller/scaler_test.go
@@ -0,0 +1,229 @@
+package controller
+
+import (
+	"context"
+	"testing"
+	"time"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/model"
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/runner"
+	"github.com/actions/scaleset"
+)
+
+type mockNotifier struct {
+	events []model.Event
+}
+
+func (m *mockNotifier) Notify(_ context.Context, event *model.Event) {
+	m.events = append(m.events, *event)
+}
+
+func newTestScaler(opts ...func(*MacOSScaler)) *MacOSScaler {
+	s := &MacOSScaler{
+		groupName:  "test-group",
+		maxRunners: 5,
+		minRunners: 0,
+		logger:     testLogger(),
+		notifier:   &mockNotifier{},
+		idle:       make(map[string]*runner.Process),
+		busy:       make(map[string]*runner.Process),
+	}
+	for _, opt := range opts {
+		opt(s)
+	}
+	return s
+}
+
+func TestSnapshots(t *testing.T) {
+	now := time.Date(2026, 1, 15, 12, 0, 0, 0, time.UTC)
+
+	tests := []struct {
+		name     string
+		idle     map[string]*runner.Process
+		busy     map[string]*runner.Process
+		wantLen  int
+		wantIdle int
+		wantBusy int
+	}{
+		{
+			name:     "empty maps",
+			idle:     map[string]*runner.Process{},
+			busy:     map[string]*runner.Process{},
+			wantLen:  0,
+			wantIdle: 0,
+			wantBusy: 0,
+		},
+		{
+			name: "one idle one busy",
+			idle: map[string]*runner.Process{
+				"r-idle": {Name: "r-idle", Group: "test-group", PID: 100, StartedAt: now},
+			},
+			busy: map[string]*runner.Process{
+				"r-busy": {Name: "r-busy", Group: "test-group", PID: 200, StartedAt: now},
+			},
+			wantLen:  2,
+			wantIdle: 1,
+			wantBusy: 1,
+		},
+		{
+			name: "all idle",
+			idle: map[string]*runner.Process{
+				"r-1": {Name: "r-1", Group: "test-group", PID: 100, StartedAt: now},
+				"r-2": {Name: "r-2", Group: "test-group", PID: 101, StartedAt: now},
+			},
+			busy:     map[string]*runner.Process{},
+			wantLen:  2,
+			wantIdle: 2,
+			wantBusy: 0,
+		},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			s := newTestScaler(func(scaler *MacOSScaler) {
+				scaler.idle = tt.idle
+				scaler.busy = tt.busy
+			})
+
+			snapshots := s.Snapshots()
+			if len(snapshots) != tt.wantLen {
+				t.Fatalf("expected %d snapshots, got %d", tt.wantLen, len(snapshots))
+			}
+
+			idleCount := 0
+			busyCount := 0
+			for _, snap := range snapshots {
+				switch snap.State {
+				case "idle":
+					idleCount++
+				case "busy":
+					busyCount++
+				default:
+					t.Fatalf("unexpected state %q", snap.State)
+				}
+				if snap.Group != "test-group" {
+					t.Fatalf("expected group test-group, got %q", snap.Group)
+				}
+			}
+			if idleCount != tt.wantIdle {
+				t.Fatalf("expected %d idle, got %d", tt.wantIdle, idleCount)
+			}
+			if busyCount != tt.wantBusy {
+				t.Fatalf("expected %d busy, got %d", tt.wantBusy, busyCount)
+			}
+		})
+	}
+}
+
+func TestHandleDesiredRunnerCount_Noop(t *testing.T) {
+	now := time.Date(2026, 1, 15, 12, 0, 0, 0, time.UTC)
+
+	s := newTestScaler(func(scaler *MacOSScaler) {
+		scaler.minRunners = 0
+		scaler.maxRunners = 5
+		scaler.idle = map[string]*runner.Process{
+			"r-1": {Name: "r-1", Group: "test-group", PID: 100, StartedAt: now},
+			"r-2": {Name: "r-2", Group: "test-group", PID: 101, StartedAt: now},
+		}
+	})
+
+	got, err := s.HandleDesiredRunnerCount(context.Background(), 2)
+	if err != nil {
+		t.Fatalf("HandleDesiredRunnerCount: %v", err)
+	}
+
+	if got != 2 {
+		t.Fatalf("expected current count 2, got %d", got)
+	}
+}
+
+func TestHandleDesiredRunnerCount_CappedByMax(t *testing.T) {
+	now := time.Date(2026, 1, 15, 12, 0, 0, 0, time.UTC)
+
+	s := newTestScaler(func(scaler *MacOSScaler) {
+		scaler.minRunners = 0
+		scaler.maxRunners = 3
+		scaler.idle = map[string]*runner.Process{
+			"r-1": {Name: "r-1", Group: "test-group", PID: 100, StartedAt: now},
+			"r-2": {Name: "r-2", Group: "test-group", PID: 101, StartedAt: now},
+			"r-3": {Name: "r-3", Group: "test-group", PID: 102, StartedAt: now},
+		}
+	})
+
+	got, err := s.HandleDesiredRunnerCount(context.Background(), 10)
+	if err != nil {
+		t.Fatalf("HandleDesiredRunnerCount: %v", err)
+	}
+
+	if got != 3 {
+		t.Fatalf("expected count 3 (capped by max), got %d", got)
+	}
+}
+
+func TestHandleJobStarted_NotFound(t *testing.T) {
+	s := newTestScaler()
+
+	err := s.HandleJobStarted(context.Background(), &scaleset.JobStarted{
+		RunnerName: "unknown-runner",
+	})
+	if err != nil {
+		t.Fatalf("expected nil error for unknown runner, got %v", err)
+	}
+}
+
+func TestHandleJobStarted_MovesToBusy(t *testing.T) {
+	now := time.Date(2026, 1, 15, 12, 0, 0, 0, time.UTC)
+	proc := &runner.Process{Name: "r-1", Group: "test-group", PID: 100, StartedAt: now}
+
+	s := newTestScaler(func(scaler *MacOSScaler) {
+		scaler.idle = map[string]*runner.Process{"r-1": proc}
+	})
+
+	err := s.HandleJobStarted(context.Background(), &scaleset.JobStarted{
+		RunnerName: "r-1",
+	})
+	if err != nil {
+		t.Fatalf("HandleJobStarted: %v", err)
+	}
+
+	if _, ok := s.idle["r-1"]; ok {
+		t.Fatal("expected runner to be removed from idle")
+	}
+	if _, ok := s.busy["r-1"]; !ok {
+		t.Fatal("expected runner to be in busy")
+	}
+}
+
+func TestHandleJobCompleted_NotFound(t *testing.T) {
+	s := newTestScaler()
+
+	err := s.HandleJobCompleted(context.Background(), &scaleset.JobCompleted{
+		RunnerName: "unknown-runner",
+		Result:     "succeeded",
+	})
+	if err != nil {
+		t.Fatalf("expected nil error for unknown runner, got %v", err)
+	}
+}
+
+func TestHandleJobCompleted_NotifiesEvent(t *testing.T) {
+	n := &mockNotifier{}
+	s := newTestScaler(func(scaler *MacOSScaler) {
+		scaler.notifier = n
+	})
+
+	err := s.HandleJobCompleted(context.Background(), &scaleset.JobCompleted{
+		RunnerName: "unknown-runner",
+		Result:     "failed",
+	})
+	if err != nil {
+		t.Fatalf("HandleJobCompleted: %v", err)
+	}
+
+	if len(n.events) != 1 {
+		t.Fatalf("expected 1 notification event, got %d", len(n.events))
+	}
+	if n.events[0].Type != model.EventRunnerFailed {
+		t.Fatalf("expected event type %q, got %q", model.EventRunnerFailed, n.events[0].Type)
+	}
+}
diff --git a/internal/github/client.go b/internal/github/client.go
new file mode 100644
index 0000000..c27346d
--- /dev/null
+++ b/internal/github/client.go
@@ -0,0 +1,138 @@
+package github
+
+import (
+	"context"
+	"fmt"
+	"os"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/auth"
+	"github.com/actions/scaleset"
+	"github.com/actions/scaleset/listener"
+)
+
+var systemInfo = scaleset.SystemInfo{
+	System:  "ghr",
+	Version: "2.0",
+}
+
+type Client struct {
+	inner *scaleset.Client
+}
+
+func NewClient(creds *auth.Credentials, githubURL string) (*Client, error) {
+	switch creds.Method {
+	case "pat":
+		return newPATClient(creds.PAT, githubURL)
+	case "github_app":
+		return newAppClient(creds.GitHubApp, githubURL)
+	default:
+		return nil, fmt.Errorf("new github client: unknown auth method %q", creds.Method)
+	}
+}
+
+func newPATClient(token, githubURL string) (*Client, error) {
+	inner, err := scaleset.NewClientWithPersonalAccessToken(scaleset.NewClientWithPersonalAccessTokenConfig{
+		GitHubConfigURL:     githubURL,
+		PersonalAccessToken: token,
+		SystemInfo:          systemInfo,
+	})
+	if err != nil {
+		return nil, fmt.Errorf("create PAT client: %w", err)
+	}
+	return &Client{inner: inner}, nil
+}
+
+func newAppClient(app *auth.GitHubAppCreds, githubURL string) (*Client, error) {
+	if app == nil {
+		return nil, fmt.Errorf("create app client: github_app credentials are nil")
+	}
+
+	pemBytes, err := os.ReadFile(app.PrivateKeyPath)
+	if err != nil {
+		return nil, fmt.Errorf("read private key %s: %w", app.PrivateKeyPath, err)
+	}
+
+	inner, err := scaleset.NewClientWithGitHubApp(scaleset.ClientWithGitHubAppConfig{
+		GitHubConfigURL: githubURL,
+		GitHubAppAuth: scaleset.GitHubAppAuth{
+			ClientID:       app.ClientID,
+			InstallationID: app.InstallationID,
+			PrivateKey:     string(pemBytes),
+		},
+		SystemInfo: systemInfo,
+	})
+	if err != nil {
+		return nil, fmt.Errorf("create app client: %w", err)
+	}
+	return &Client{inner: inner}, nil
+}
+
+func (c *Client) CreateScaleSet(ctx context.Context, name string, runnerGroupID int, labels []string) (*scaleset.RunnerScaleSet, error) {
+	sdkLabels := make([]scaleset.Label, len(labels))
+	for i, l := range labels {
+		sdkLabels[i] = scaleset.Label{Type: "System", Name: l}
+	}
+
+	ss, err := c.inner.CreateRunnerScaleSet(ctx, &scaleset.RunnerScaleSet{
+		Name:          name,
+		RunnerGroupID: runnerGroupID,
+		Labels:        sdkLabels,
+		RunnerSetting: scaleset.RunnerSetting{DisableUpdate: true},
+	})
+	if err != nil {
+		return nil, fmt.Errorf("create scale set %q: %w", name, err)
+	}
+	return ss, nil
+}
+
+func (c *Client) GetScaleSet(ctx context.Context, runnerGroupID int, name string) (*scaleset.RunnerScaleSet, error) {
+	ss, err := c.inner.GetRunnerScaleSet(ctx, runnerGroupID, name)
+	if err != nil {
+		return nil, fmt.Errorf("get scale set %q: %w", name, err)
+	}
+	return ss, nil
+}
+
+func (c *Client) GetScaleSetByID(ctx context.Context, id int) (*scaleset.RunnerScaleSet, error) {
+	ss, err := c.inner.GetRunnerScaleSetByID(ctx, id)
+	if err != nil {
+		return nil, fmt.Errorf("get scale set by id %d: %w", id, err)
+	}
+	return ss, nil
+}
+
+func (c *Client) DeleteScaleSet(ctx context.Context, id int) error {
+	if err := c.inner.DeleteRunnerScaleSet(ctx, id); err != nil {
+		return fmt.Errorf("delete scale set %d: %w", id, err)
+	}
+	return nil
+}
+
+func (c *Client) GenerateJITConfig(ctx context.Context, scaleSetID int, runnerName string) (string, error) {
+	jit, err := c.inner.GenerateJitRunnerConfig(ctx, &scaleset.RunnerScaleSetJitRunnerSetting{
+		Name: runnerName,
+	}, scaleSetID)
+	if err != nil {
+		return "", fmt.Errorf("generate JIT config for %q: %w", runnerName, err)
+	}
+	return jit.EncodedJITConfig, nil
+}
+
+func (c *Client) OpenSession(ctx context.Context, scaleSetID int, owner string) (*scaleset.MessageSessionClient, error) {
+	session, err := c.inner.MessageSessionClient(ctx, scaleSetID, owner)
+	if err != nil {
+		return nil, fmt.Errorf("open session for scale set %d: %w", scaleSetID, err)
+	}
+	return session, nil
+}
+
+func (c *Client) NewListener(session *scaleset.MessageSessionClient, scaleSetID, maxRunners int) (*listener.Listener, error) {
+	l, err := listener.New(session, listener.Config{
+		ScaleSetID: scaleSetID,
+		MaxRunners: maxRunners,
+	})
+	if err != nil {
+		return nil, fmt.Errorf("create listener for scale set %d: %w", scaleSetID, err)
+	}
+	return l, nil
+}
diff --git a/internal/github/client_test.go b/internal/github/client_test.go
new file mode 100644
index 0000000..6f39636
--- /dev/null
+++ b/internal/github/client_test.go
@@ -0,0 +1,143 @@
+package github
+
+import (
+	"os"
+	"path/filepath"
+	"testing"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/auth"
+)
+
+func TestNewClient_PAT(t *testing.T) {
+	creds := &auth.Credentials{
+		Method: "pat",
+		PAT:    "ghp_test1234567890",
+	}
+
+	client, err := NewClient(creds, "https://github.com/test-org")
+	if err != nil {
+		t.Fatalf("expected no error, got %v", err)
+	}
+	if client == nil {
+		t.Fatal("expected non-nil client")
+	}
+	if client.inner == nil {
+		t.Fatal("expected non-nil inner client")
+	}
+}
+
+func TestNewClient_GitHubApp(t *testing.T) {
+	keyContent := `-----BEGIN RSA PRIVATE KEY-----
+MIIEpAIBAAKCAQEA0Z3VS5JJcds3xfn/ygWyF8PbnGy0AHB7MhgHcTz6sE2I2yPB
+aNlRtQ8aXEr55FZgMvemuafJoqfiN2OkXvMPMID2KJHnfxJPMSdMoBRk7GkLVOH
+OBnG9gVmZ5A6iNFwHGO9BKnL7P7iCfxWJCFxdF0qNGBJjqMJjHb6cDAVJfb0Q5K
+xHE6UKJhne1RDmaoW/4Vh+M3OAv8MXPqp0qhBkJYYlTpjRkLjF2MOqMmGKO7UmB
+dVjr3HvaGRFnRlq5mzv2JlFjQFPXYiRgDrU/K3Y2MnsQfJGP7TW0j5FFsiZp7vTV
+P0WBLZEQy2mvVz9y9X78JiK64ijr4EDRqKi1NQIDAQABAoIBAC5RgZ+hBx7xHNaM
+pPgwGMnCd2vHsHwAaXkeAzSdRnLBDqPWJGJmaCF3B/cQHan5IMnVEL2T0KDiWqjh
+ax7GiMAPPkCgarSHMC4sPXTR0NHHZxC5bED5z98rIqabSChzmZjDe6FMqpljhdJR
+0K/gUVLqCRJjHNdGIFsmi2amEMGdlxEJmH3FvSmhaxAhIfxmSGNNEPzMCQl5mmFM
+OqoB3BtMdn/qxg9grs08PHshqJdH6QilaRy6KfDEuHpgMZav2RI7sjChTQaI+MUN
+FzkaOq1M2C17xjIT3vlQ3WJkQXZrYJP5FGGxgI2RfVROaGE8+BiFKzIGudPJ8NpB
+OCSrUmECgYEA7wO6fDL+S6YJdAJ8YTBOfNny/VkECzk2sxhvP5pKGp0tzGAYq3BM
+uRjdrR7Cj+cW1gi6DRezMX+r5jMXnBQkmRqyZ6u3r9XSvuEyiGmd+qNWm7iFt6FX
+3VdANYsl6xMOPNmAzKm0ZFb0J9J3BHL+F+1adij6YqN+OlTIRLpbpzUCgYEA4G/W
+9T1XT/dPIHr7PGBFuJ3vkLNU1ITk2LCPTCkghq9vFf+/F8RQ/eDa9fugVDJnHlMm
+qiFUWHfBmoANRrAQKbw8kN6E8Oij1F5Y09mW0fqzlMF1bRUxOJ0SXdyp8RIIYO9n
+g5UlD1UqRCsAWxJN7vE1VX/bZb3OIEQ0C+YfKkECgYBHPCA22lpjsJGIbgIEkk9Q
+Cm1WlCXBH7SgXBMoJwJfKSIqn4TRJ9RLfMqFLVTJDNIGdIkLUJPR78VR8qJwqifz
+LnGPEjMTIZEfHvJlUDI6dEe6n5ENZB9evRQ0MflIsNkGHQ0qzLGLPYGWmJ0TBy8J
+aIFZ1GfwBlSPI/4ffNV8bQKBgQCFDMcMJoB+urH7sMFEgH5P3fHEQHjfJNrDaBPM
+YCUWa8DTQD9/7HzIepcWKEVr4jSBK2D0B0sFqgHhD0UIc/WW7IQKyKlmEjz7oSR
+7YR2FUycBRTxZ6EmGlK5E67z1Q2FHeFJgIq2ip1Rb6VLFy8yAaDPxPQ8YIBNlQdp
+S+hkAQKBgQDR4LJibkXz+U/5MhQT+IhEVeEBH5fTkOD6oIOJHd17DMQ5mi+zBPf0
+hB+sQ+zl3lOKJGjTTqdapnJeT8v5JD1TvVCDBii6niUoR6TFB3qxaOjv/VEL1Cf3
+G5FadRKM/l54xfA+mEHxkO/nGxH7fBatEJRE3l6K9MmIq2gOMCF0MQ==
+-----END RSA PRIVATE KEY-----`
+
+	tmpDir := t.TempDir()
+	keyPath := filepath.Join(tmpDir, "test-key.pem")
+	if err := os.WriteFile(keyPath, []byte(keyContent), 0600); err != nil {
+		t.Fatalf("write test key: %v", err)
+	}
+
+	creds := &auth.Credentials{
+		Method: "github_app",
+		GitHubApp: &auth.GitHubAppCreds{
+			ClientID:       "Iv1.test123",
+			InstallationID: 12345,
+			PrivateKeyPath: keyPath,
+		},
+	}
+
+	client, err := NewClient(creds, "https://github.com/test-org")
+	if err != nil {
+		t.Fatalf("expected no error, got %v", err)
+	}
+	if client == nil {
+		t.Fatal("expected non-nil client")
+	}
+}
+
+func TestNewClient_UnknownMethod(t *testing.T) {
+	creds := &auth.Credentials{
+		Method: "oauth",
+	}
+
+	client, err := NewClient(creds, "https://github.com/test-org")
+	if err == nil {
+		t.Fatal("expected error for unknown method")
+	}
+	if client != nil {
+		t.Fatal("expected nil client on error")
+	}
+}
+
+func TestNewClient_AppNilCreds(t *testing.T) {
+	creds := &auth.Credentials{
+		Method:    "github_app",
+		GitHubApp: nil,
+	}
+
+	client, err := NewClient(creds, "https://github.com/test-org")
+	if err == nil {
+		t.Fatal("expected error for nil github_app creds")
+	}
+	if client != nil {
+		t.Fatal("expected nil client on error")
+	}
+}
+
+func TestNewClient_AppMissingKeyFile(t *testing.T) {
+	creds := &auth.Credentials{
+		Method: "github_app",
+		GitHubApp: &auth.GitHubAppCreds{
+			ClientID:       "Iv1.test123",
+			InstallationID: 12345,
+			PrivateKeyPath: "/nonexistent/path/key.pem",
+		},
+	}
+
+	client, err := NewClient(creds, "https://github.com/test-org")
+	if err == nil {
+		t.Fatal("expected error for missing key file")
+	}
+	if client != nil {
+		t.Fatal("expected nil client on error")
+	}
+}
+
+func TestNewClient_InvalidGitHubURL(t *testing.T) {
+	creds := &auth.Credentials{
+		Method: "pat",
+		PAT:    "ghp_test1234567890",
+	}
+
+	client, err := NewClient(creds, "://invalid-url")
+	if err == nil {
+		t.Fatal("expected error for invalid URL")
+	}
+	if client != nil {
+		t.Fatal("expected nil client on error")
+	}
+}
diff --git a/internal/health/checks.go b/internal/health/checks.go
new file mode 100644
index 0000000..3a9a94b
--- /dev/null
+++ b/internal/health/checks.go
@@ -0,0 +1,213 @@
+package health
+
+import (
+	"context"
+	"fmt"
+	"syscall"
+	"time"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/model"
+)
+
+func (m *Monitor) runChecks(ctx context.Context) {
+	start := time.Now()
+
+	m.mu.Lock()
+	defer m.mu.Unlock()
+
+	m.issues = m.issues[:0]
+
+	snapshots := m.runners.Snapshots()
+	totalActual := 0
+	totalDesired := 0
+
+	for group, snaps := range snapshots {
+		m.checkRunnerLiveness(ctx, group, snaps)
+		m.checkRunnerTimeouts(ctx, group, snaps)
+		m.checkIdleTimeouts(ctx, group, snaps)
+		gs := m.getOrCreateGroup(group)
+		m.checkGroupDivergence(group, len(snaps), gs)
+		m.checkConsecutiveFailures(group, gs)
+		totalActual += len(snaps)
+		totalDesired += gs.lastDesiredCount
+	}
+
+	m.checkDiskSpace()
+	m.lastCheck = time.Now()
+	checkDuration := time.Since(start)
+
+	for _, r := range m.reporters {
+		r.ReportDaemonHealth(ctx, len(snapshots), totalActual, totalDesired, checkDuration)
+	}
+	for group, snaps := range snapshots {
+		gs := m.getOrCreateGroup(group)
+		for _, r := range m.reporters {
+			r.ReportGroupHealth(ctx, group, len(snaps), gs.lastDesiredCount)
+		}
+	}
+
+	for _, issue := range m.issues {
+		m.notifier.Notify(ctx, &model.Event{
+			Type:      issue.Type,
+			Level:     issue.Level,
+			Group:     issue.Group,
+			Runner:    issue.Runner,
+			Message:   issue.Message,
+			Timestamp: issue.DetectedAt,
+		})
+	}
+}
+
+func (m *Monitor) checkRunnerLiveness(ctx context.Context, group string, snapshots []model.RunnerSnapshot) {
+	for _, snap := range snapshots {
+		if snap.PID <= 0 {
+			continue
+		}
+		if err := syscall.Kill(snap.PID, 0); err != nil {
+			m.issues = append(m.issues, model.HealthIssue{
+				Level:      model.LevelError,
+				Type:       model.EventHealthZombieRunner,
+				Group:      group,
+				Runner:     snap.Name,
+				Message:    fmt.Sprintf("runner %s (pid %d) is no longer alive", snap.Name, snap.PID),
+				DetectedAt: time.Now(),
+			})
+			if m.killer != nil {
+				if killErr := m.killer.KillRunner(ctx, group, snap.Name); killErr != nil {
+					m.logger.ErrorContext(ctx, "failed to kill zombie runner", "group", group, "runner", snap.Name, "error", killErr)
+				}
+			}
+		}
+	}
+}
+
+func (m *Monitor) checkRunnerTimeouts(ctx context.Context, group string, snapshots []model.RunnerSnapshot) {
+	if m.cfg.RunnerTimeout <= 0 {
+		return
+	}
+
+	now := time.Now()
+	for _, snap := range snapshots {
+		if snap.State != "busy" {
+			continue
+		}
+		if snap.StartedAt.IsZero() {
+			continue
+		}
+		if now.Sub(snap.StartedAt) <= m.cfg.RunnerTimeout {
+			continue
+		}
+		m.issues = append(m.issues, model.HealthIssue{
+			Level:      model.LevelWarning,
+			Type:       model.EventHealthRunnerTimeout,
+			Group:      group,
+			Runner:     snap.Name,
+			Message:    fmt.Sprintf("runner %s has been busy for %s (timeout: %s)", snap.Name, now.Sub(snap.StartedAt).Round(time.Second), m.cfg.RunnerTimeout),
+			DetectedAt: now,
+		})
+		if m.killer != nil {
+			if killErr := m.killer.KillRunner(ctx, group, snap.Name); killErr != nil {
+				m.logger.ErrorContext(ctx, "failed to kill timed-out runner", "group", group, "runner", snap.Name, "error", killErr)
+			}
+		}
+	}
+}
+
+func (m *Monitor) checkIdleTimeouts(ctx context.Context, group string, snapshots []model.RunnerSnapshot) {
+	if m.cfg.IdleTimeout <= 0 {
+		return
+	}
+
+	minRunners := 0
+	if m.cfg.GroupMinRunners != nil {
+		minRunners = m.cfg.GroupMinRunners[group]
+	}
+
+	now := time.Now()
+	var timedOut []model.RunnerSnapshot
+	for _, snap := range snapshots {
+		if snap.State != "idle" || snap.StartedAt.IsZero() {
+			continue
+		}
+		if now.Sub(snap.StartedAt) > m.cfg.IdleTimeout {
+			timedOut = append(timedOut, snap)
+		}
+	}
+
+	idleCount := 0
+	for _, snap := range snapshots {
+		if snap.State == "idle" {
+			idleCount++
+		}
+	}
+
+	killable := idleCount - minRunners
+	for _, snap := range timedOut {
+		if killable <= 0 {
+			break
+		}
+		m.issues = append(m.issues, model.HealthIssue{
+			Level:      model.LevelWarning,
+			Type:       model.EventHealthIdleTimeout,
+			Group:      group,
+			Runner:     snap.Name,
+			Message:    fmt.Sprintf("runner %s has been idle for %s (timeout: %s)", snap.Name, now.Sub(snap.StartedAt).Round(time.Second), m.cfg.IdleTimeout),
+			DetectedAt: now,
+		})
+		if m.killer != nil {
+			if killErr := m.killer.KillRunner(ctx, group, snap.Name); killErr != nil {
+				m.logger.ErrorContext(ctx, "failed to kill idle runner", "group", group, "runner", snap.Name, "error", killErr)
+			}
+		}
+		killable--
+	}
+}
+
+func (m *Monitor) checkGroupDivergence(group string, actualCount int, gs *groupState) {
+	if m.cfg.DivergenceTimeout <= 0 {
+		return
+	}
+	if gs.lastDesiredCount == 0 {
+		return
+	}
+
+	if actualCount == gs.lastDesiredCount {
+		gs.degradedSince = nil
+		return
+	}
+
+	now := time.Now()
+	if gs.degradedSince == nil {
+		gs.degradedSince = &now
+		return
+	}
+
+	if now.Sub(*gs.degradedSince) < m.cfg.DivergenceTimeout {
+		return
+	}
+
+	m.issues = append(m.issues, model.HealthIssue{
+		Level:      model.LevelWarning,
+		Type:       model.EventHealthGroupDegraded,
+		Group:      group,
+		Message:    fmt.Sprintf("group %s has %d runners but %d desired for %s", group, actualCount, gs.lastDesiredCount, now.Sub(*gs.degradedSince).Round(time.Second)),
+		DetectedAt: now,
+	})
+}
+
+func (m *Monitor) checkConsecutiveFailures(group string, gs *groupState) {
+	if m.cfg.MaxConsecutiveFailures <= 0 {
+		return
+	}
+	if gs.consecutiveFailures <= m.cfg.MaxConsecutiveFailures {
+		return
+	}
+
+	m.issues = append(m.issues, model.HealthIssue{
+		Level:      model.LevelCritical,
+		Type:       model.EventHealthGroupFailing,
+		Group:      group,
+		Message:    fmt.Sprintf("group %s has %d consecutive start failures (threshold: %d)", group, gs.consecutiveFailures, m.cfg.MaxConsecutiveFailures),
+		DetectedAt: time.Now(),
+	})
+}
diff --git a/internal/health/checks_disk.go b/internal/health/checks_disk.go
new file mode 100644
index 0000000..68665d2
--- /dev/null
+++ b/internal/health/checks_disk.go
@@ -0,0 +1,33 @@
+package health
+
+import (
+	"fmt"
+	"syscall"
+	"time"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/model"
+)
+
+func (m *Monitor) checkDiskSpace() {
+	if m.cfg.MinDiskSpace <= 0 {
+		return
+	}
+
+	var stat syscall.Statfs_t
+	if err := syscall.Statfs("/", &stat); err != nil {
+		m.logger.Warn("failed to check disk space", "error", err)
+		return
+	}
+
+	available := int64(stat.Bavail) * int64(stat.Bsize) //nolint:unconvert // Bsize type varies by OS
+	if available < m.cfg.MinDiskSpace {
+		m.issues = append(m.issues, model.HealthIssue{
+			Level:      model.LevelWarning,
+			Type:       model.EventHealthDiskLow,
+			Group:      "",
+			Runner:     "",
+			Message:    fmt.Sprintf("available disk space %d bytes is below minimum %d bytes", available, m.cfg.MinDiskSpace),
+			DetectedAt: time.Now(),
+		})
+	}
+}
diff --git a/internal/health/checks_test.go b/internal/health/checks_test.go
new file mode 100644
index 0000000..2f99716
--- /dev/null
+++ b/internal/health/checks_test.go
@@ -0,0 +1,317 @@
+package health
+
+import (
+	"context"
+	"fmt"
+	"log/slog"
+	"os"
+	"testing"
+	"time"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/model"
+)
+
+type noopNotifier struct {
+	events []model.Event
+}
+
+func (n *noopNotifier) Notify(_ context.Context, event *model.Event) {
+	n.events = append(n.events, *event)
+}
+
+type fakeRunnerState struct {
+	snapshots map[string][]model.RunnerSnapshot
+}
+
+func (f *fakeRunnerState) Snapshots() map[string][]model.RunnerSnapshot {
+	return f.snapshots
+}
+
+type fakeKiller struct {
+	killed []string
+	err    error
+}
+
+func (f *fakeKiller) KillRunner(_ context.Context, group string, runner string) error {
+	f.killed = append(f.killed, fmt.Sprintf("%s/%s", group, runner))
+	return f.err
+}
+
+func noopLogger() *slog.Logger {
+	return slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{Level: slog.LevelError + 1}))
+}
+
+func TestCheckIdleTimeouts(t *testing.T) {
+	tests := []struct {
+		name        string
+		idleTimeout time.Duration
+		snapshots   []model.RunnerSnapshot
+		wantIssues  int
+	}{
+		{
+			name:        "disabled when timeout is zero",
+			idleTimeout: 0,
+			snapshots: []model.RunnerSnapshot{
+				{Name: "r1", State: "idle", StartedAt: time.Now().Add(-1 * time.Hour)},
+			},
+			wantIssues: 0,
+		},
+		{
+			name:        "no issue when under timeout",
+			idleTimeout: 30 * time.Minute,
+			snapshots: []model.RunnerSnapshot{
+				{Name: "r1", State: "idle", StartedAt: time.Now().Add(-10 * time.Minute)},
+			},
+			wantIssues: 0,
+		},
+		{
+			name:        "issue when over timeout",
+			idleTimeout: 30 * time.Minute,
+			snapshots: []model.RunnerSnapshot{
+				{Name: "r1", State: "idle", StartedAt: time.Now().Add(-1 * time.Hour)},
+			},
+			wantIssues: 1,
+		},
+		{
+			name:        "busy runners are skipped",
+			idleTimeout: 30 * time.Minute,
+			snapshots: []model.RunnerSnapshot{
+				{Name: "r1", State: "busy", StartedAt: time.Now().Add(-1 * time.Hour)},
+			},
+			wantIssues: 0,
+		},
+		{
+			name:        "zero StartedAt is skipped",
+			idleTimeout: 30 * time.Minute,
+			snapshots: []model.RunnerSnapshot{
+				{Name: "r1", State: "idle"},
+			},
+			wantIssues: 0,
+		},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			m := newTestMonitor(nil, nil, nil)
+			m.cfg.IdleTimeout = tt.idleTimeout
+			m.issues = m.issues[:0]
+
+			m.checkIdleTimeouts(context.Background(), "test-group", tt.snapshots)
+
+			if len(m.issues) != tt.wantIssues {
+				t.Errorf("expected %d issues, got %d", tt.wantIssues, len(m.issues))
+			}
+			if tt.wantIssues > 0 && m.issues[0].Type != model.EventHealthIdleTimeout {
+				t.Errorf("expected type %s, got %s", model.EventHealthIdleTimeout, m.issues[0].Type)
+			}
+		})
+	}
+}
+
+func TestCheckGroupDivergence(t *testing.T) {
+	tests := []struct {
+		name              string
+		divergenceTimeout time.Duration
+		actualCount       int
+		desiredCount      int
+		degradedSince     *time.Time
+		wantIssues        int
+		wantDegraded      bool
+	}{
+		{
+			name:              "disabled when timeout is zero",
+			divergenceTimeout: 0,
+			actualCount:       1,
+			desiredCount:      3,
+			wantIssues:        0,
+		},
+		{
+			name:              "no issue when counts match",
+			divergenceTimeout: 5 * time.Minute,
+			actualCount:       3,
+			desiredCount:      3,
+			wantIssues:        0,
+		},
+		{
+			name:              "no issue when desired is zero",
+			divergenceTimeout: 5 * time.Minute,
+			actualCount:       1,
+			desiredCount:      0,
+			wantIssues:        0,
+		},
+		{
+			name:              "first divergence sets degradedSince",
+			divergenceTimeout: 5 * time.Minute,
+			actualCount:       1,
+			desiredCount:      3,
+			wantIssues:        0,
+			wantDegraded:      true,
+		},
+		{
+			name:              "issue after timeout exceeded",
+			divergenceTimeout: 5 * time.Minute,
+			actualCount:       1,
+			desiredCount:      3,
+			degradedSince:     timePtr(time.Now().Add(-10 * time.Minute)),
+			wantIssues:        1,
+		},
+		{
+			name:              "no issue before timeout",
+			divergenceTimeout: 5 * time.Minute,
+			actualCount:       1,
+			desiredCount:      3,
+			degradedSince:     timePtr(time.Now().Add(-2 * time.Minute)),
+			wantIssues:        0,
+		},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			m := newTestMonitor(nil, nil, nil)
+			m.cfg.DivergenceTimeout = tt.divergenceTimeout
+			m.issues = m.issues[:0]
+
+			gs := &groupState{
+				lastDesiredCount: tt.desiredCount,
+				degradedSince:    tt.degradedSince,
+			}
+
+			m.checkGroupDivergence("test-group", tt.actualCount, gs)
+
+			if len(m.issues) != tt.wantIssues {
+				t.Errorf("expected %d issues, got %d", tt.wantIssues, len(m.issues))
+			}
+			if tt.wantDegraded && gs.degradedSince == nil {
+				t.Error("expected degradedSince to be set")
+			}
+			if tt.wantIssues > 0 && m.issues[0].Type != model.EventHealthGroupDegraded {
+				t.Errorf("expected type %s, got %s", model.EventHealthGroupDegraded, m.issues[0].Type)
+			}
+		})
+	}
+}
+
+func TestCheckConsecutiveFailures(t *testing.T) {
+	tests := []struct {
+		name        string
+		maxFailures int
+		failures    int
+		wantIssues  int
+	}{
+		{
+			name:        "disabled when max is zero",
+			maxFailures: 0,
+			failures:    10,
+			wantIssues:  0,
+		},
+		{
+			name:        "no issue at threshold",
+			maxFailures: 5,
+			failures:    5,
+			wantIssues:  0,
+		},
+		{
+			name:        "issue above threshold",
+			maxFailures: 5,
+			failures:    6,
+			wantIssues:  1,
+		},
+		{
+			name:        "no issue below threshold",
+			maxFailures: 5,
+			failures:    3,
+			wantIssues:  0,
+		},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			m := newTestMonitor(nil, nil, nil)
+			m.cfg.MaxConsecutiveFailures = tt.maxFailures
+			m.issues = m.issues[:0]
+
+			gs := &groupState{consecutiveFailures: tt.failures}
+			m.checkConsecutiveFailures("test-group", gs)
+
+			if len(m.issues) != tt.wantIssues {
+				t.Errorf("expected %d issues, got %d", tt.wantIssues, len(m.issues))
+			}
+			if tt.wantIssues > 0 {
+				if m.issues[0].Type != model.EventHealthGroupFailing {
+					t.Errorf("expected type %s, got %s", model.EventHealthGroupFailing, m.issues[0].Type)
+				}
+				if m.issues[0].Level != model.LevelCritical {
+					t.Errorf("expected level %s, got %s", model.LevelCritical, m.issues[0].Level)
+				}
+			}
+		})
+	}
+}
+
+func TestCheckRunnerTimeouts_KillsRunner(t *testing.T) {
+	killer := &fakeKiller{}
+	m := NewMonitor(
+		MonitorConfig{
+			Enabled:       true,
+			RunnerTimeout: 1 * time.Hour,
+		},
+		&noopNotifier{},
+		nil,
+		nil,
+		killer,
+		noopLogger(),
+	)
+
+	snaps := []model.RunnerSnapshot{
+		{Name: "r1", State: "busy", PID: 1, StartedAt: time.Now().Add(-2 * time.Hour)},
+	}
+
+	m.checkRunnerTimeouts(context.Background(), "group-a", snaps)
+
+	if len(killer.killed) != 1 {
+		t.Fatalf("expected 1 kill call, got %d", len(killer.killed))
+	}
+	if killer.killed[0] != "group-a/r1" {
+		t.Errorf("expected kill group-a/r1, got %s", killer.killed[0])
+	}
+}
+
+func TestRunChecks_IntegrationWithNotifier(t *testing.T) {
+	notif := &noopNotifier{}
+	state := &fakeRunnerState{
+		snapshots: map[string][]model.RunnerSnapshot{
+			"group-a": {
+				{Name: "r1", State: "idle", PID: 99999999, StartedAt: time.Now().Add(-2 * time.Hour)},
+			},
+		},
+	}
+
+	m := NewMonitor(
+		MonitorConfig{
+			Enabled:       true,
+			CheckInterval: time.Second,
+			IdleTimeout:   30 * time.Minute,
+		},
+		notif,
+		state,
+		nil,
+		nil,
+		noopLogger(),
+	)
+
+	m.runChecks(context.Background())
+
+	foundIdle := false
+	for _, e := range notif.events {
+		if e.Type == model.EventHealthIdleTimeout {
+			foundIdle = true
+		}
+	}
+	if !foundIdle {
+		t.Error("expected idle timeout event to be notified")
+	}
+}
+
+func timePtr(t time.Time) *time.Time {
+	return &t
+}
diff --git a/internal/health/group_state.go b/internal/health/group_state.go
new file mode 100644
index 0000000..fe19267
--- /dev/null
+++ b/internal/health/group_state.go
@@ -0,0 +1,42 @@
+package health
+
+import "time"
+
+type groupState struct {
+	consecutiveFailures int
+	degradedSince       *time.Time
+	lastDesiredCount    int
+}
+
+func (m *Monitor) getOrCreateGroup(name string) *groupState {
+	gs, ok := m.groups[name]
+	if !ok {
+		gs = &groupState{}
+		m.groups[name] = gs
+	}
+	return gs
+}
+
+func (m *Monitor) UpdateGroupStats(group string, desired int) {
+	m.mu.Lock()
+	defer m.mu.Unlock()
+
+	gs := m.getOrCreateGroup(group)
+	gs.lastDesiredCount = desired
+}
+
+func (m *Monitor) RecordStartFailure(group string) {
+	m.mu.Lock()
+	defer m.mu.Unlock()
+
+	gs := m.getOrCreateGroup(group)
+	gs.consecutiveFailures++
+}
+
+func (m *Monitor) RecordStartSuccess(group string) {
+	m.mu.Lock()
+	defer m.mu.Unlock()
+
+	gs := m.getOrCreateGroup(group)
+	gs.consecutiveFailures = 0
+}
diff --git a/internal/health/group_state_test.go b/internal/health/group_state_test.go
new file mode 100644
index 0000000..0681c4d
--- /dev/null
+++ b/internal/health/group_state_test.go
@@ -0,0 +1,94 @@
+package health
+
+import (
+	"testing"
+)
+
+func TestUpdateGroupStats(t *testing.T) {
+	m := newTestMonitor(nil, nil, nil)
+
+	m.UpdateGroupStats("group-a", 3)
+
+	m.mu.RLock()
+	gs, ok := m.groups["group-a"]
+	m.mu.RUnlock()
+
+	if !ok {
+		t.Fatal("expected group-a to exist in groups map")
+	}
+	if gs.lastDesiredCount != 3 {
+		t.Errorf("expected lastDesiredCount=3, got %d", gs.lastDesiredCount)
+	}
+}
+
+func TestRecordStartFailure(t *testing.T) {
+	m := newTestMonitor(nil, nil, nil)
+
+	m.RecordStartFailure("group-a")
+	m.RecordStartFailure("group-a")
+	m.RecordStartFailure("group-a")
+
+	m.mu.RLock()
+	gs := m.groups["group-a"]
+	m.mu.RUnlock()
+
+	if gs.consecutiveFailures != 3 {
+		t.Errorf("expected 3 consecutive failures, got %d", gs.consecutiveFailures)
+	}
+}
+
+func TestRecordStartSuccess_ResetsFailures(t *testing.T) {
+	m := newTestMonitor(nil, nil, nil)
+
+	m.RecordStartFailure("group-a")
+	m.RecordStartFailure("group-a")
+	m.RecordStartSuccess("group-a")
+
+	m.mu.RLock()
+	gs := m.groups["group-a"]
+	m.mu.RUnlock()
+
+	if gs.consecutiveFailures != 0 {
+		t.Errorf("expected 0 consecutive failures after success, got %d", gs.consecutiveFailures)
+	}
+}
+
+func TestGetOrCreateGroup_CreatesIfMissing(t *testing.T) {
+	m := newTestMonitor(nil, nil, nil)
+
+	m.mu.Lock()
+	gs := m.getOrCreateGroup("new-group")
+	m.mu.Unlock()
+
+	if gs == nil {
+		t.Fatal("expected non-nil groupState")
+	}
+	if gs.consecutiveFailures != 0 {
+		t.Errorf("expected 0 consecutive failures for new group, got %d", gs.consecutiveFailures)
+	}
+}
+
+func TestGetOrCreateGroup_ReturnsExisting(t *testing.T) {
+	m := newTestMonitor(nil, nil, nil)
+
+	m.RecordStartFailure("group-a")
+
+	m.mu.Lock()
+	gs := m.getOrCreateGroup("group-a")
+	m.mu.Unlock()
+
+	if gs.consecutiveFailures != 1 {
+		t.Errorf("expected 1 consecutive failure for existing group, got %d", gs.consecutiveFailures)
+	}
+}
+
+func newTestMonitor(runners RunnerStateProvider, killer RunnerKiller, reporters []Reporter) *Monitor {
+	return NewMonitor(
+		MonitorConfig{Enabled: true},
+		&noopNotifier{},
+		runners,
+		reporters,
+		killer,
+		noopLogger(),
+	)
+}
diff --git a/internal/health/monitor.go b/internal/health/monitor.go
new file mode 100644
index 0000000..f0e722f
--- /dev/null
+++ b/internal/health/monitor.go
@@ -0,0 +1,103 @@
+package health
+
+import (
+	"context"
+	"log/slog"
+	"sync"
+	"time"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/model"
+)
+
+type RunnerStateProvider interface {
+	Snapshots() map[string][]model.RunnerSnapshot
+}
+
+type Notifier interface {
+	Notify(ctx context.Context, event *model.Event)
+}
+
+type Reporter interface {
+	ReportDaemonHealth(ctx context.Context, groups int, totalActual int, totalDesired int, checkDuration time.Duration)
+	ReportGroupHealth(ctx context.Context, group string, actual int, desired int)
+}
+
+type RunnerKiller interface {
+	KillRunner(ctx context.Context, group string, runner string) error
+}
+
+type MonitorConfig struct {
+	Enabled                bool
+	CheckInterval          time.Duration
+	RunnerTimeout          time.Duration
+	IdleTimeout            time.Duration
+	DivergenceTimeout      time.Duration
+	MaxConsecutiveFailures int
+	FailureCooldown        time.Duration
+	MinDiskSpace           int64
+	GroupMinRunners        map[string]int
+}
+
+type Monitor struct {
+	cfg       MonitorConfig
+	logger    *slog.Logger
+	notifier  Notifier
+	runners   RunnerStateProvider
+	reporters []Reporter
+	killer    RunnerKiller
+
+	mu        sync.RWMutex
+	lastCheck time.Time
+	issues    []model.HealthIssue
+	groups    map[string]*groupState
+}
+
+func NewMonitor(
+	cfg MonitorConfig,
+	notifier Notifier,
+	runners RunnerStateProvider,
+	reporters []Reporter,
+	killer RunnerKiller,
+	logger *slog.Logger,
+) *Monitor {
+	return &Monitor{
+		cfg:       cfg,
+		logger:    logger,
+		notifier:  notifier,
+		runners:   runners,
+		reporters: reporters,
+		killer:    killer,
+		groups:    make(map[string]*groupState),
+	}
+}
+
+func (m *Monitor) Run(ctx context.Context) error {
+	if !m.cfg.Enabled {
+		return nil
+	}
+
+	ticker := time.NewTicker(m.cfg.CheckInterval)
+	defer ticker.Stop()
+
+	for {
+		select {
+		case <-ctx.Done():
+			return nil
+		case <-ticker.C:
+			m.runChecks(ctx)
+		}
+	}
+}
+
+func (m *Monitor) Status() HealthStatus {
+	m.mu.RLock()
+	defer m.mu.RUnlock()
+
+	copied := make([]model.HealthIssue, len(m.issues))
+	copy(copied, m.issues)
+
+	return HealthStatus{
+		LastCheck: m.lastCheck,
+		Issues:    copied,
+	}
+}
diff --git a/internal/health/status.go b/internal/health/status.go
new file mode 100644
index 0000000..ad0942b
--- /dev/null
+++ b/internal/health/status.go
@@ -0,0 +1,12 @@
+package health
+
+import (
+	"time"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/model"
+)
+
+type HealthStatus struct {
+	LastCheck time.Time
+	Issues    []model.HealthIssue
+}
diff --git a/internal/launchd/launchctl.go b/internal/launchd/launchctl.go
new file mode 100644
index 0000000..0f1d44f
--- /dev/null
+++ b/internal/launchd/launchctl.go
@@ -0,0 +1,38 @@
+package launchd
+
+import (
+	"fmt"
+	"os/exec"
+)
+
+func launchctlLoad(plistPath string) error {
+	out, err := exec.Command("launchctl", "load", plistPath).CombinedOutput()
+	if err != nil {
+		return fmt.Errorf("launchctl load: %w: %s", err, string(out))
+	}
+	return nil
+}
+
+func launchctlUnload(plistPath string) error {
+	out, err := exec.Command("launchctl", "unload", plistPath).CombinedOutput()
+	if err != nil {
+		return fmt.Errorf("launchctl unload: %w: %s", err, string(out))
+	}
+	return nil
+}
+
+func launchctlStart(label string) error {
+	out, err := exec.Command("launchctl", "start", label).CombinedOutput()
+	if err != nil {
+		return fmt.Errorf("launchctl start: %w: %s", err, string(out))
+	}
+	return nil
+}
+
+func launchctlStop(label string) error {
+	out, err := exec.Command("launchctl", "stop", label).CombinedOutput()
+	if err != nil {
+		return fmt.Errorf("launchctl stop: %w: %s", err, string(out))
+	}
+	return nil
+}
diff --git a/internal/launchd/plist.go b/internal/launchd/plist.go
new file mode 100644
index 0000000..84f2b9d
--- /dev/null
+++ b/internal/launchd/plist.go
@@ -0,0 +1,56 @@
+package launchd
+
+import (
+	"bytes"
+	"fmt"
+	"text/template"
+)
+
+const plistTemplate = `<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
+<plist version="1.0">
+<dict>
+    <key>Label</key>
+    <string>{{.Label}}</string>
+    <key>ProgramArguments</key>
+    <array>
+        <string>{{.BinaryPath}}</string>
+        <string>run</string>
+        <string>--config</string>
+        <string>{{.ConfigPath}}</string>
+    </array>
+    <key>RunAtLoad</key>
+    <true/>
+    <key>KeepAlive</key>
+    <dict>
+        <key>SuccessfulExit</key>
+        <false/>
+    </dict>
+    <key>StandardOutPath</key>
+    <string>{{.LogDir}}/daemon.log</string>
+    <key>StandardErrorPath</key>
+    <string>{{.LogDir}}/daemon.err</string>
+    <key>WorkingDirectory</key>
+    <string>{{.StateDir}}</string>
+    <key>EnvironmentVariables</key>
+    <dict>
+        <key>PATH</key>
+        <string>/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin</string>
+    </dict>
+</dict>
+</plist>
+`
+
+func generatePlist(cfg *ServiceConfig) ([]byte, error) {
+	tmpl, err := template.New("plist").Parse(plistTemplate)
+	if err != nil {
+		return nil, fmt.Errorf("parse plist template: %w", err)
+	}
+
+	var buf bytes.Buffer
+	if err := tmpl.Execute(&buf, cfg); err != nil {
+		return nil, fmt.Errorf("execute plist template: %w", err)
+	}
+
+	return buf.Bytes(), nil
+}
diff --git a/internal/launchd/plist_test.go b/internal/launchd/plist_test.go
new file mode 100644
index 0000000..ee8ff4b
--- /dev/null
+++ b/internal/launchd/plist_test.go
@@ -0,0 +1,67 @@
+package launchd
+
+import (
+	"strings"
+	"testing"
+)
+
+func TestGeneratePlist_ValidConfig(t *testing.T) {
+	cfg := ServiceConfig{
+		Label:      "com.ghr.daemon",
+		BinaryPath: "/usr/local/bin/ghr",
+		ConfigPath: "/etc/ghr/config.yaml",
+		LogDir:     "/var/log/ghr",
+		StateDir:   "/var/lib/ghr/state",
+	}
+
+	data, err := generatePlist(&cfg)
+	if err != nil {
+		t.Fatalf("generatePlist() error = %v", err)
+	}
+
+	result := string(data)
+
+	checks := []struct {
+		name     string
+		expected string
+	}{
+		{"xml header", `<?xml version="1.0" encoding="UTF-8"?>`},
+		{"label", `<string>com.ghr.daemon</string>`},
+		{"binary path", `<string>/usr/local/bin/ghr</string>`},
+		{"run command", `<string>run</string>`},
+		{"config flag", `<string>--config</string>`},
+		{"config path", `<string>/etc/ghr/config.yaml</string>`},
+		{"stdout path", `<string>/var/log/ghr/daemon.log</string>`},
+		{"stderr path", `<string>/var/log/ghr/daemon.err</string>`},
+		{"workdir", `<string>/var/lib/ghr/state</string>`},
+		{"run at load", `<true/>`},
+		{"keep alive", `<key>SuccessfulExit</key>`},
+	}
+
+	for _, tc := range checks {
+		t.Run(tc.name, func(t *testing.T) {
+			if !strings.Contains(result, tc.expected) {
+				t.Errorf("plist missing %q", tc.expected)
+			}
+		})
+	}
+}
+
+func TestGeneratePlist_SpecialChars(t *testing.T) {
+	cfg := ServiceConfig{
+		Label:      "com.ghr.test",
+		BinaryPath: "/path/with spaces/ghr",
+		ConfigPath: "/config/test.yaml",
+		LogDir:     "/tmp/logs",
+		StateDir:   "/tmp/state",
+	}
+
+	data, err := generatePlist(&cfg)
+	if err != nil {
+		t.Fatalf("generatePlist() error = %v", err)
+	}
+
+	if !strings.Contains(string(data), "/path/with spaces/ghr") {
+		t.Error("plist should preserve paths with spaces")
+	}
+}
diff --git a/internal/launchd/service.go b/internal/launchd/service.go
new file mode 100644
index 0000000..0bf494b
--- /dev/null
+++ b/internal/launchd/service.go
@@ -0,0 +1,103 @@
+package launchd
+
+import (
+	"fmt"
+	"os"
+	"os/exec"
+	"path/filepath"
+	"strconv"
+	"strings"
+)
+
+type ServiceConfig struct {
+	Label      string
+	BinaryPath string
+	ConfigPath string
+	LogDir     string
+	StateDir   string
+}
+
+func DefaultLabel() string { return "com.ghr.daemon" }
+
+func PlistPath(label string) string {
+	if os.Getuid() == 0 {
+		return filepath.Join("/Library", "LaunchDaemons", label+".plist")
+	}
+	home, err := os.UserHomeDir()
+	if err != nil {
+		home = "."
+	}
+	return filepath.Join(home, "Library", "LaunchAgents", label+".plist")
+}
+
+func Install(cfg *ServiceConfig) error {
+	data, err := generatePlist(cfg)
+	if err != nil {
+		return fmt.Errorf("generate plist: %w", err)
+	}
+
+	plistPath := PlistPath(cfg.Label)
+	dir := filepath.Dir(plistPath)
+	if err := os.MkdirAll(dir, 0o755); err != nil {
+		return fmt.Errorf("create plist directory %s: %w", dir, err)
+	}
+
+	if err := os.WriteFile(plistPath, data, 0o644); err != nil {
+		return fmt.Errorf("write plist %s: %w", plistPath, err)
+	}
+
+	if err := launchctlLoad(plistPath); err != nil {
+		return fmt.Errorf("launchctl load: %w", err)
+	}
+
+	if err := launchctlStart(cfg.Label); err != nil {
+		return fmt.Errorf("launchctl start: %w", err)
+	}
+
+	return nil
+}
+
+func Uninstall(label string) error {
+	plistPath := PlistPath(label)
+
+	_ = launchctlStop(label)
+	_ = launchctlUnload(plistPath)
+
+	if err := os.Remove(plistPath); err != nil && !os.IsNotExist(err) {
+		return fmt.Errorf("remove plist %s: %w", plistPath, err)
+	}
+
+	return nil
+}
+
+func IsRunning(label string) bool {
+	_, running := Status(label)
+	return running
+}
+
+func Status(label string) (int, bool) {
+	out, err := exec.Command("launchctl", "list").Output()
+	if err != nil {
+		return 0, false
+	}
+
+	for _, line := range strings.Split(string(out), "\n") {
+		if !strings.Contains(line, label) {
+			continue
+		}
+		fields := strings.Fields(line)
+		if len(fields) < 3 {
+			continue
+		}
+		if fields[2] != label {
+			continue
+		}
+		pid, parseErr := strconv.Atoi(fields[0])
+		if parseErr != nil || pid <= 0 {
+			return 0, false
+		}
+		return pid, true
+	}
+
+	return 0, false
+}
diff --git a/internal/launchd/service_test.go b/internal/launchd/service_test.go
new file mode 100644
index 0000000..29882e9
--- /dev/null
+++ b/internal/launchd/service_test.go
@@ -0,0 +1,56 @@
+package launchd
+
+import (
+	"strings"
+	"testing"
+)
+
+func TestDefaultLabel(t *testing.T) {
+	label := DefaultLabel()
+	if label != "com.ghr.daemon" {
+		t.Errorf("DefaultLabel() = %q, want %q", label, "com.ghr.daemon")
+	}
+}
+
+func TestPlistPath_NonRoot(t *testing.T) {
+	path := PlistPath("com.ghr.daemon")
+	if !strings.HasSuffix(path, "Library/LaunchAgents/com.ghr.daemon.plist") &&
+		!strings.HasSuffix(path, "Library/LaunchDaemons/com.ghr.daemon.plist") {
+		t.Errorf("PlistPath() = %q, expected LaunchAgents or LaunchDaemons suffix", path)
+	}
+}
+
+func TestPlistPath_ContainsLabel(t *testing.T) {
+	tests := []struct {
+		name  string
+		label string
+	}{
+		{"default label", "com.ghr.daemon"},
+		{"custom label", "com.ghr.test"},
+	}
+
+	for _, tc := range tests {
+		t.Run(tc.name, func(t *testing.T) {
+			path := PlistPath(tc.label)
+			if !strings.Contains(path, tc.label+".plist") {
+				t.Errorf("PlistPath(%q) = %q, missing label in path", tc.label, path)
+			}
+		})
+	}
+}
+
+func TestStatus_NotRunning(t *testing.T) {
+	pid, running := Status("com.ghr.test.nonexistent.label.12345")
+	if running {
+		t.Errorf("Status() running = true for nonexistent label")
+	}
+	if pid != 0 {
+		t.Errorf("Status() pid = %d, want 0", pid)
+	}
+}
+
+func TestIsRunning_NotRunning(t *testing.T) {
+	if IsRunning("com.ghr.test.nonexistent.label.12345") {
+		t.Error("IsRunning() = true for nonexistent label")
+	}
+}
diff --git a/internal/logging/handler.go b/internal/logging/handler.go
new file mode 100644
index 0000000..7a8c901
--- /dev/null
+++ b/internal/logging/handler.go
@@ -0,0 +1,52 @@
+package logging
+
+import (
+	"context"
+	"fmt"
+	"log/slog"
+	"os"
+)
+
+type MultiHandler struct {
+	handlers []slog.Handler
+}
+
+func NewMultiHandler(handlers ...slog.Handler) *MultiHandler {
+	h := make([]slog.Handler, len(handlers))
+	copy(h, handlers)
+	return &MultiHandler{handlers: h}
+}
+
+func (h *MultiHandler) Enabled(ctx context.Context, level slog.Level) bool {
+	for _, handler := range h.handlers {
+		if handler.Enabled(ctx, level) {
+			return true
+		}
+	}
+	return false
+}
+
+func (h *MultiHandler) Handle(ctx context.Context, r slog.Record) error {
+	for _, handler := range h.handlers {
+		if err := handler.Handle(ctx, r); err != nil {
+			fmt.Fprintf(os.Stderr, "logging: handler error: %v\n", err)
+		}
+	}
+	return nil
+}
+
+func (h *MultiHandler) WithAttrs(attrs []slog.Attr) slog.Handler {
+	cloned := make([]slog.Handler, len(h.handlers))
+	for i, handler := range h.handlers {
+		cloned[i] = handler.WithAttrs(attrs)
+	}
+	return &MultiHandler{handlers: cloned}
+}
+
+func (h *MultiHandler) WithGroup(name string) slog.Handler {
+	cloned := make([]slog.Handler, len(h.handlers))
+	for i, handler := range h.handlers {
+		cloned[i] = handler.WithGroup(name)
+	}
+	return &MultiHandler{handlers: cloned}
+}
diff --git a/internal/logging/level.go b/internal/logging/level.go
new file mode 100644
index 0000000..d79c2f2
--- /dev/null
+++ b/internal/logging/level.go
@@ -0,0 +1,29 @@
+package logging
+
+import (
+	"log/slog"
+	"strings"
+)
+
+type LogConfig struct {
+	Level         string
+	Format        string
+	Dir           string
+	RetentionDays int
+	RunnerOutput  bool
+}
+
+func ParseLevel(s string) slog.Level {
+	switch strings.ToLower(s) {
+	case "debug":
+		return slog.LevelDebug
+	case "info":
+		return slog.LevelInfo
+	case "warn":
+		return slog.LevelWarn
+	case "error":
+		return slog.LevelError
+	default:
+		return slog.LevelInfo
+	}
+}
diff --git a/internal/logging/logger_test.go b/internal/logging/logger_test.go
new file mode 100644
index 0000000..900c355
--- /dev/null
+++ b/internal/logging/logger_test.go
@@ -0,0 +1,602 @@
+package logging
+
+import (
+	"bytes"
+	"context"
+	"encoding/json"
+	"log/slog"
+	"os"
+	"path/filepath"
+	"strings"
+	"testing"
+	"time"
+)
+
+// ---------------------------------------------------------------------------
+// TestParseLevel
+// ---------------------------------------------------------------------------
+
+func TestParseLevel(t *testing.T) {
+	tests := []struct {
+		name  string
+		input string
+		want  slog.Level
+	}{
+		{"debug lowercase", "debug", slog.LevelDebug},
+		{"info lowercase", "info", slog.LevelInfo},
+		{"warn lowercase", "warn", slog.LevelWarn},
+		{"error lowercase", "error", slog.LevelError},
+		{"DEBUG uppercase", "DEBUG", slog.LevelDebug},
+		{"Info mixed case", "Info", slog.LevelInfo},
+		{"unknown defaults to info", "unknown", slog.LevelInfo},
+		{"empty defaults to info", "", slog.LevelInfo},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			got := ParseLevel(tt.input)
+			if got != tt.want {
+				t.Errorf("ParseLevel(%q) = %v, want %v", tt.input, got, tt.want)
+			}
+		})
+	}
+}
+
+// ---------------------------------------------------------------------------
+// TestNew
+// ---------------------------------------------------------------------------
+
+func TestNew(t *testing.T) {
+	t.Run("valid config creates dirs", func(t *testing.T) {
+		dir := t.TempDir()
+		cfg := LogConfig{
+			Level:  "info",
+			Format: "json",
+			Dir:    dir,
+		}
+
+		mgr, err := New(cfg)
+		if err != nil {
+			t.Fatalf("New() error = %v", err)
+		}
+		defer mgr.Close()
+
+		daemonDir := filepath.Join(dir, "daemon")
+		groupsDir := filepath.Join(dir, "groups")
+
+		if info, statErr := os.Stat(daemonDir); statErr != nil || !info.IsDir() {
+			t.Errorf("daemon directory not created at %s", daemonDir)
+		}
+		if info, statErr := os.Stat(groupsDir); statErr != nil || !info.IsDir() {
+			t.Errorf("groups directory not created at %s", groupsDir)
+		}
+	})
+
+	t.Run("empty Dir returns error", func(t *testing.T) {
+		cfg := LogConfig{Dir: ""}
+		_, err := New(cfg)
+		if err == nil {
+			t.Fatal("New() with empty Dir should return error")
+		}
+		if !strings.Contains(err.Error(), "dir must not be empty") {
+			t.Errorf("unexpected error message: %v", err)
+		}
+	})
+}
+
+// ---------------------------------------------------------------------------
+// TestMultiHandler
+// ---------------------------------------------------------------------------
+
+func TestMultiHandler(t *testing.T) {
+	t.Run("fans out to all handlers", func(t *testing.T) {
+		var buf1, buf2 bytes.Buffer
+		h1 := slog.NewJSONHandler(&buf1, &slog.HandlerOptions{Level: slog.LevelDebug})
+		h2 := slog.NewJSONHandler(&buf2, &slog.HandlerOptions{Level: slog.LevelDebug})
+
+		multi := NewMultiHandler(h1, h2)
+		logger := slog.New(multi)
+		logger.Info("hello multi")
+
+		for i, buf := range []*bytes.Buffer{&buf1, &buf2} {
+			content := buf.String()
+			if content == "" {
+				t.Errorf("buffer %d is empty, expected log output", i)
+				continue
+			}
+			var entry map[string]interface{}
+			if err := json.Unmarshal([]byte(strings.TrimSpace(content)), &entry); err != nil {
+				t.Errorf("buffer %d: failed to parse JSON: %v", i, err)
+				continue
+			}
+			if msg, ok := entry["msg"].(string); !ok || msg != "hello multi" {
+				t.Errorf("buffer %d: msg = %v, want %q", i, entry["msg"], "hello multi")
+			}
+		}
+	})
+
+	t.Run("WithAttrs propagates to all handlers", func(t *testing.T) {
+		var buf1, buf2 bytes.Buffer
+		h1 := slog.NewJSONHandler(&buf1, &slog.HandlerOptions{Level: slog.LevelDebug})
+		h2 := slog.NewJSONHandler(&buf2, &slog.HandlerOptions{Level: slog.LevelDebug})
+
+		multi := NewMultiHandler(h1, h2)
+		withAttrs := multi.WithAttrs([]slog.Attr{slog.String("key", "val")})
+		logger := slog.New(withAttrs)
+		logger.Info("with attrs")
+
+		for i, buf := range []*bytes.Buffer{&buf1, &buf2} {
+			content := buf.String()
+			var entry map[string]interface{}
+			if err := json.Unmarshal([]byte(strings.TrimSpace(content)), &entry); err != nil {
+				t.Errorf("buffer %d: failed to parse JSON: %v", i, err)
+				continue
+			}
+			if v, ok := entry["key"].(string); !ok || v != "val" {
+				t.Errorf("buffer %d: key = %v, want %q", i, entry["key"], "val")
+			}
+		}
+	})
+
+	t.Run("Enabled returns true if any handler is enabled", func(t *testing.T) {
+		// h1 only enabled at Error, h2 enabled at Debug
+		h1 := slog.NewJSONHandler(os.Stderr, &slog.HandlerOptions{Level: slog.LevelError})
+		h2 := slog.NewJSONHandler(os.Stderr, &slog.HandlerOptions{Level: slog.LevelDebug})
+		multi := NewMultiHandler(h1, h2)
+
+		// Debug should be enabled because h2 accepts it
+		if !multi.Enabled(context.TODO(), slog.LevelDebug) {
+			t.Error("Enabled(Debug) = false, want true (h2 accepts Debug)")
+		}
+		// Info should be enabled because h2 accepts it
+		if !multi.Enabled(context.TODO(), slog.LevelInfo) {
+			t.Error("Enabled(Info) = false, want true")
+		}
+	})
+
+	t.Run("Enabled returns false when no handler is enabled", func(t *testing.T) {
+		h1 := slog.NewJSONHandler(os.Stderr, &slog.HandlerOptions{Level: slog.LevelError})
+		h2 := slog.NewJSONHandler(os.Stderr, &slog.HandlerOptions{Level: slog.LevelError})
+		multi := NewMultiHandler(h1, h2)
+
+		if multi.Enabled(context.TODO(), slog.LevelDebug) {
+			t.Error("Enabled(Debug) = true, want false (both require Error)")
+		}
+	})
+}
+
+// ---------------------------------------------------------------------------
+// helpers
+// ---------------------------------------------------------------------------
+
+// newTestManager creates a LogManager in a temporary directory with debug level.
+func newTestManager(t *testing.T) *LogManager {
+	t.Helper()
+	dir := t.TempDir()
+	cfg := LogConfig{
+		Level:        "debug",
+		Format:       "json",
+		Dir:          dir,
+		RunnerOutput: true,
+	}
+	mgr, err := New(cfg)
+	if err != nil {
+		t.Fatalf("newTestManager: %v", err)
+	}
+	t.Cleanup(func() { mgr.Close() })
+	return mgr
+}
+
+// readJSONLines reads a file and returns each line as a parsed JSON map.
+func readJSONLines(t *testing.T, path string) []map[string]interface{} {
+	t.Helper()
+	data, err := os.ReadFile(path)
+	if err != nil {
+		t.Fatalf("readJSONLines: read %s: %v", path, err)
+	}
+	var result []map[string]interface{}
+	for _, line := range strings.Split(strings.TrimSpace(string(data)), "\n") {
+		if line == "" {
+			continue
+		}
+		var entry map[string]interface{}
+		if err := json.Unmarshal([]byte(line), &entry); err != nil {
+			t.Fatalf("readJSONLines: parse line %q: %v", line, err)
+		}
+		result = append(result, entry)
+	}
+	return result
+}
+
+// todayFile returns the log filename for the current (possibly mocked) date.
+func todayFile() string {
+	return nowFunc().Format("2006-01-02") + ".json"
+}
+
+// ---------------------------------------------------------------------------
+// TestDaemonLogger
+// ---------------------------------------------------------------------------
+
+func TestDaemonLogger(t *testing.T) {
+	mgr := newTestManager(t)
+
+	logger, err := mgr.DaemonLogger()
+	if err != nil {
+		t.Fatalf("DaemonLogger() error = %v", err)
+	}
+
+	logger.Info("daemon test message")
+
+	// Flush: close the manager so files are flushed.
+	if err := mgr.Close(); err != nil {
+		t.Fatalf("Close() error = %v", err)
+	}
+
+	logFile := filepath.Join(mgr.rootDir, "daemon", todayFile())
+	entries := readJSONLines(t, logFile)
+	if len(entries) == 0 {
+		t.Fatal("expected at least one log entry in daemon log")
+	}
+
+	found := false
+	for _, e := range entries {
+		if msg, ok := e["msg"].(string); ok && msg == "daemon test message" {
+			found = true
+			if comp, ok := e["component"].(string); !ok || comp != "daemon" {
+				t.Errorf("component = %v, want %q", e["component"], "daemon")
+			}
+		}
+	}
+	if !found {
+		t.Error("did not find 'daemon test message' in daemon log file")
+	}
+}
+
+// ---------------------------------------------------------------------------
+// TestGroupLogger
+// ---------------------------------------------------------------------------
+
+func TestGroupLogger(t *testing.T) {
+	mgr := newTestManager(t)
+
+	logger, err := mgr.GroupLogger("test-group")
+	if err != nil {
+		t.Fatalf("GroupLogger() error = %v", err)
+	}
+
+	logger.Info("group test message")
+
+	if err := mgr.Close(); err != nil {
+		t.Fatalf("Close() error = %v", err)
+	}
+
+	// Check group log file.
+	groupFile := filepath.Join(mgr.rootDir, "groups", "test-group", todayFile())
+	groupEntries := readJSONLines(t, groupFile)
+	found := false
+	for _, e := range groupEntries {
+		if msg, ok := e["msg"].(string); ok && msg == "group test message" {
+			found = true
+			if comp, ok := e["component"].(string); !ok || comp != "group" {
+				t.Errorf("component = %v, want %q", e["component"], "group")
+			}
+			if g, ok := e["group"].(string); !ok || g != "test-group" {
+				t.Errorf("group = %v, want %q", e["group"], "test-group")
+			}
+		}
+	}
+	if !found {
+		t.Errorf("did not find 'group test message' in group log file %s", groupFile)
+	}
+
+	// Check propagation to daemon log.
+	daemonFile := filepath.Join(mgr.rootDir, "daemon", todayFile())
+	daemonEntries := readJSONLines(t, daemonFile)
+	found = false
+	for _, e := range daemonEntries {
+		if msg, ok := e["msg"].(string); ok && msg == "group test message" {
+			found = true
+		}
+	}
+	if !found {
+		t.Error("group message did not propagate to daemon log file")
+	}
+}
+
+// ---------------------------------------------------------------------------
+// TestRunnerLogger
+// ---------------------------------------------------------------------------
+
+func TestRunnerLogger(t *testing.T) {
+	mgr := newTestManager(t)
+
+	logger, err := mgr.RunnerLogger("test-group", "runner-abc")
+	if err != nil {
+		t.Fatalf("RunnerLogger() error = %v", err)
+	}
+
+	logger.Info("runner test message")
+
+	if err := mgr.Close(); err != nil {
+		t.Fatalf("Close() error = %v", err)
+	}
+
+	today := todayFile()
+
+	// Verify message in runner log.
+	runnerFile := filepath.Join(mgr.rootDir, "groups", "test-group", "runners", "runner-abc", today)
+	runnerEntries := readJSONLines(t, runnerFile)
+	found := false
+	for _, e := range runnerEntries {
+		if msg, ok := e["msg"].(string); ok && msg == "runner test message" {
+			found = true
+			if comp, ok := e["component"].(string); !ok || comp != "runner" {
+				t.Errorf("runner log: component = %v, want %q", e["component"], "runner")
+			}
+			if g, ok := e["group"].(string); !ok || g != "test-group" {
+				t.Errorf("runner log: group = %v, want %q", e["group"], "test-group")
+			}
+			if r, ok := e["runner"].(string); !ok || r != "runner-abc" {
+				t.Errorf("runner log: runner = %v, want %q", e["runner"], "runner-abc")
+			}
+		}
+	}
+	if !found {
+		t.Errorf("did not find 'runner test message' in runner log file %s", runnerFile)
+	}
+
+	// Verify propagation to group log.
+	groupFile := filepath.Join(mgr.rootDir, "groups", "test-group", today)
+	groupEntries := readJSONLines(t, groupFile)
+	found = false
+	for _, e := range groupEntries {
+		if msg, ok := e["msg"].(string); ok && msg == "runner test message" {
+			found = true
+		}
+	}
+	if !found {
+		t.Error("runner message did not propagate to group log file")
+	}
+
+	// Verify propagation to daemon log.
+	daemonFile := filepath.Join(mgr.rootDir, "daemon", today)
+	daemonEntries := readJSONLines(t, daemonFile)
+	found = false
+	for _, e := range daemonEntries {
+		if msg, ok := e["msg"].(string); ok && msg == "runner test message" {
+			found = true
+		}
+	}
+	if !found {
+		t.Error("runner message did not propagate to daemon log file")
+	}
+}
+
+// ---------------------------------------------------------------------------
+// TestDateRotation
+// ---------------------------------------------------------------------------
+
+func TestDateRotation(t *testing.T) {
+	orig := nowFunc
+	defer func() { nowFunc = orig }()
+
+	day1 := time.Date(2024, 1, 15, 12, 0, 0, 0, time.UTC)
+	day2 := time.Date(2024, 1, 16, 12, 0, 0, 0, time.UTC)
+
+	nowFunc = func() time.Time { return day1 }
+
+	dir := t.TempDir()
+	w, err := newDateAwareWriter(dir)
+	if err != nil {
+		t.Fatalf("newDateAwareWriter() error = %v", err)
+	}
+	defer w.Close()
+
+	// Write on day 1.
+	_, err = w.Write([]byte("day1 line\n"))
+	if err != nil {
+		t.Fatalf("Write day1: %v", err)
+	}
+
+	file1 := filepath.Join(dir, "2024-01-15.json")
+	if _, statErr := os.Stat(file1); statErr != nil {
+		t.Errorf("expected file %s to exist after day1 write", file1)
+	}
+
+	// Advance to day 2.
+	nowFunc = func() time.Time { return day2 }
+
+	_, err = w.Write([]byte("day2 line\n"))
+	if err != nil {
+		t.Fatalf("Write day2: %v", err)
+	}
+
+	file2 := filepath.Join(dir, "2024-01-16.json")
+	if _, statErr := os.Stat(file2); statErr != nil {
+		t.Errorf("expected file %s to exist after day2 write", file2)
+	}
+
+	// Verify contents.
+	data1, err := os.ReadFile(file1)
+	if err != nil {
+		t.Fatalf("ReadFile day1: %v", err)
+	}
+	if !strings.Contains(string(data1), "day1 line") {
+		t.Errorf("day1 file content = %q, want to contain %q", data1, "day1 line")
+	}
+
+	data2, err := os.ReadFile(file2)
+	if err != nil {
+		t.Fatalf("ReadFile day2: %v", err)
+	}
+	if !strings.Contains(string(data2), "day2 line") {
+		t.Errorf("day2 file content = %q, want to contain %q", data2, "day2 line")
+	}
+}
+
+// ---------------------------------------------------------------------------
+// TestRunnerOutputFile
+// ---------------------------------------------------------------------------
+
+func TestRunnerOutputFile(t *testing.T) {
+	mgr := newTestManager(t)
+
+	wc, err := mgr.RunnerOutputFile("group", "runner")
+	if err != nil {
+		t.Fatalf("RunnerOutputFile() error = %v", err)
+	}
+
+	payload := []byte("some runner output\n")
+	n, err := wc.Write(payload)
+	if err != nil {
+		t.Fatalf("Write() error = %v", err)
+	}
+	if n != len(payload) {
+		t.Errorf("Write() wrote %d bytes, want %d", n, len(payload))
+	}
+
+	outFile := filepath.Join(mgr.rootDir, "groups", "group", "runners", "runner", todayFile())
+	if _, statErr := os.Stat(outFile); statErr != nil {
+		t.Errorf("expected output file at %s", outFile)
+	}
+
+	if err := wc.Close(); err != nil {
+		t.Errorf("Close() error = %v", err)
+	}
+
+	data, err := os.ReadFile(outFile)
+	if err != nil {
+		t.Fatalf("ReadFile: %v", err)
+	}
+	if !strings.Contains(string(data), "some runner output") {
+		t.Errorf("output file content = %q, want to contain %q", data, "some runner output")
+	}
+}
+
+// ---------------------------------------------------------------------------
+// TestCleanupOldLogs
+// ---------------------------------------------------------------------------
+
+func TestCleanupOldLogs(t *testing.T) {
+	dir := t.TempDir()
+	cfg := LogConfig{
+		Level:         "info",
+		Format:        "json",
+		Dir:           dir,
+		RetentionDays: 1,
+	}
+	mgr, err := New(cfg)
+	if err != nil {
+		t.Fatalf("New() error = %v", err)
+	}
+	defer mgr.Close()
+
+	daemonDir := filepath.Join(dir, "daemon")
+
+	// Create an old log file (modification time 3 days ago).
+	oldFile := filepath.Join(daemonDir, "2024-01-10.json")
+	if err := os.WriteFile(oldFile, []byte(`{"msg":"old"}`+"\n"), 0o644); err != nil {
+		t.Fatalf("WriteFile old: %v", err)
+	}
+	oldTime := time.Now().AddDate(0, 0, -3)
+	if err := os.Chtimes(oldFile, oldTime, oldTime); err != nil {
+		t.Fatalf("Chtimes old: %v", err)
+	}
+
+	// Create a fresh log file (modification time is now).
+	freshFile := filepath.Join(daemonDir, "2024-01-15.json")
+	if err := os.WriteFile(freshFile, []byte(`{"msg":"fresh"}`+"\n"), 0o644); err != nil {
+		t.Fatalf("WriteFile fresh: %v", err)
+	}
+
+	if err := mgr.CleanupOldLogs(); err != nil {
+		t.Fatalf("CleanupOldLogs() error = %v", err)
+	}
+
+	// Old file should be deleted.
+	if _, statErr := os.Stat(oldFile); !os.IsNotExist(statErr) {
+		t.Errorf("old file %s should have been deleted", oldFile)
+	}
+
+	// Fresh file should remain.
+	if _, statErr := os.Stat(freshFile); statErr != nil {
+		t.Errorf("fresh file %s should still exist: %v", freshFile, statErr)
+	}
+}
+
+// ---------------------------------------------------------------------------
+// TestCleanupOldLogs_Disabled
+// ---------------------------------------------------------------------------
+
+func TestCleanupOldLogs_Disabled(t *testing.T) {
+	dir := t.TempDir()
+	cfg := LogConfig{
+		Level:         "info",
+		Format:        "json",
+		Dir:           dir,
+		RetentionDays: 0, // disabled
+	}
+	mgr, err := New(cfg)
+	if err != nil {
+		t.Fatalf("New() error = %v", err)
+	}
+	defer mgr.Close()
+
+	daemonDir := filepath.Join(dir, "daemon")
+
+	// Create an old file.
+	oldFile := filepath.Join(daemonDir, "2020-01-01.json")
+	if err := os.WriteFile(oldFile, []byte(`{"msg":"ancient"}`+"\n"), 0o644); err != nil {
+		t.Fatalf("WriteFile: %v", err)
+	}
+	oldTime := time.Now().AddDate(-4, 0, 0)
+	if err := os.Chtimes(oldFile, oldTime, oldTime); err != nil {
+		t.Fatalf("Chtimes: %v", err)
+	}
+
+	if err := mgr.CleanupOldLogs(); err != nil {
+		t.Fatalf("CleanupOldLogs() error = %v", err)
+	}
+
+	// File should NOT be deleted when RetentionDays=0.
+	if _, statErr := os.Stat(oldFile); statErr != nil {
+		t.Errorf("old file %s should NOT have been deleted (RetentionDays=0): %v", oldFile, statErr)
+	}
+}
+
+// ---------------------------------------------------------------------------
+// TestClose
+// ---------------------------------------------------------------------------
+
+func TestClose(t *testing.T) {
+	mgr := newTestManager(t)
+
+	// Create several loggers to open multiple writers.
+	if _, err := mgr.DaemonLogger(); err != nil {
+		t.Fatalf("DaemonLogger: %v", err)
+	}
+	if _, err := mgr.GroupLogger("group-a"); err != nil {
+		t.Fatalf("GroupLogger: %v", err)
+	}
+	if _, err := mgr.RunnerLogger("group-a", "runner-1"); err != nil {
+		t.Fatalf("RunnerLogger: %v", err)
+	}
+
+	mgr.mu.Lock()
+	writerCount := len(mgr.writers)
+	mgr.mu.Unlock()
+	if writerCount == 0 {
+		t.Error("expected writers to be tracked before Close()")
+	}
+
+	if err := mgr.Close(); err != nil {
+		t.Fatalf("Close() error = %v", err)
+	}
+
+	mgr.mu.Lock()
+	writersAfter := mgr.writers
+	mgr.mu.Unlock()
+	if writersAfter != nil {
+		t.Errorf("expected writers to be nil after Close(), got len=%d", len(writersAfter))
+	}
+}
diff --git a/internal/logging/manager.go b/internal/logging/manager.go
new file mode 100644
index 0000000..1e1abb9
--- /dev/null
+++ b/internal/logging/manager.go
@@ -0,0 +1,180 @@
+package logging
+
+import (
+	"context"
+	"fmt"
+	"io"
+	"log/slog"
+	"os"
+	"path/filepath"
+	"strings"
+	"sync"
+	"time"
+)
+
+type LogManager struct {
+	cfg     LogConfig
+	rootDir string
+	level   slog.Level
+
+	mu      sync.Mutex
+	writers []*dateAwareWriter
+}
+
+func New(cfg LogConfig) (*LogManager, error) {
+	if cfg.Dir == "" {
+		return nil, fmt.Errorf("logging: dir must not be empty")
+	}
+
+	daemonDir := filepath.Join(cfg.Dir, "daemon")
+	groupsDir := filepath.Join(cfg.Dir, "groups")
+
+	if err := os.MkdirAll(daemonDir, 0o755); err != nil {
+		return nil, fmt.Errorf("logging: create daemon dir: %w", err)
+	}
+	if err := os.MkdirAll(groupsDir, 0o755); err != nil {
+		return nil, fmt.Errorf("logging: create groups dir: %w", err)
+	}
+
+	return &LogManager{
+		cfg:     cfg,
+		rootDir: cfg.Dir,
+		level:   ParseLevel(cfg.Level),
+	}, nil
+}
+
+func (m *LogManager) Close() error {
+	m.mu.Lock()
+	defer m.mu.Unlock()
+	var firstErr error
+	for _, w := range m.writers {
+		if err := w.Close(); err != nil && firstErr == nil {
+			firstErr = err
+		}
+	}
+	m.writers = nil
+	return firstErr
+}
+
+func (m *LogManager) trackWriter(w *dateAwareWriter) {
+	m.mu.Lock()
+	defer m.mu.Unlock()
+	m.writers = append(m.writers, w)
+}
+
+func (m *LogManager) consoleHandler() slog.Handler {
+	opts := &slog.HandlerOptions{Level: m.level}
+	if strings.EqualFold(m.cfg.Format, "json") {
+		return slog.NewJSONHandler(os.Stderr, opts)
+	}
+	return slog.NewTextHandler(os.Stderr, opts)
+}
+
+func (m *LogManager) fileHandler(subdir string) (slog.Handler, error) {
+	dir := filepath.Join(m.rootDir, subdir)
+	w, err := newDateAwareWriter(dir)
+	if err != nil {
+		return nil, err
+	}
+	m.trackWriter(w)
+	opts := &slog.HandlerOptions{Level: m.level}
+	return slog.NewJSONHandler(w, opts), nil
+}
+
+func (m *LogManager) DaemonLogger() (*slog.Logger, error) {
+	daemonFileH, err := m.fileHandler("daemon")
+	if err != nil {
+		return nil, fmt.Errorf("logging: daemon file handler: %w", err)
+	}
+	multi := NewMultiHandler(daemonFileH, m.consoleHandler())
+	return slog.New(multi).With("component", "daemon"), nil
+}
+
+func (m *LogManager) GroupLogger(group string) (*slog.Logger, error) {
+	groupDir := filepath.Join("groups", group)
+	groupFileH, err := m.fileHandler(groupDir)
+	if err != nil {
+		return nil, fmt.Errorf("logging: group file handler for %q: %w", group, err)
+	}
+
+	daemonFileH, err := m.fileHandler("daemon")
+	if err != nil {
+		return nil, fmt.Errorf("logging: daemon file handler (group %q): %w", group, err)
+	}
+
+	multi := NewMultiHandler(groupFileH, daemonFileH, m.consoleHandler())
+	return slog.New(multi).With("component", "group", "group", group), nil
+}
+
+func (m *LogManager) RunnerLogger(group, runner string) (*slog.Logger, error) {
+	runnerDir := filepath.Join("groups", group, "runners", runner)
+	runnerFileH, err := m.fileHandler(runnerDir)
+	if err != nil {
+		return nil, fmt.Errorf("logging: runner file handler for %q/%q: %w", group, runner, err)
+	}
+
+	groupDir := filepath.Join("groups", group)
+	groupFileH, err := m.fileHandler(groupDir)
+	if err != nil {
+		return nil, fmt.Errorf("logging: group file handler for runner %q/%q: %w", group, runner, err)
+	}
+
+	daemonFileH, err := m.fileHandler("daemon")
+	if err != nil {
+		return nil, fmt.Errorf("logging: daemon file handler (runner %q/%q): %w", group, runner, err)
+	}
+
+	multi := NewMultiHandler(runnerFileH, groupFileH, daemonFileH, m.consoleHandler())
+	return slog.New(multi).With("component", "runner", "group", group, "runner", runner), nil
+}
+
+func (m *LogManager) RunnerOutputFile(group, runner string) (io.WriteCloser, error) {
+	dir := filepath.Join(m.rootDir, "groups", group, "runners", runner)
+	w, err := newDateAwareWriter(dir)
+	if err != nil {
+		return nil, fmt.Errorf("logging: runner output file for %q/%q: %w", group, runner, err)
+	}
+	m.trackWriter(w)
+	return w, nil
+}
+
+func (m *LogManager) StartCleanupScheduler(ctx context.Context) error {
+	ticker := time.NewTicker(24 * time.Hour)
+	defer ticker.Stop()
+
+	for {
+		select {
+		case <-ctx.Done():
+			return nil
+		case <-ticker.C:
+			if err := m.CleanupOldLogs(); err != nil {
+				fmt.Fprintf(os.Stderr, "log cleanup error: %v\n", err)
+			}
+		}
+	}
+}
+
+func (m *LogManager) CleanupOldLogs() error {
+	if m.cfg.RetentionDays <= 0 {
+		return nil
+	}
+	cutoff := nowFunc().AddDate(0, 0, -m.cfg.RetentionDays)
+
+	return filepath.Walk(m.rootDir, func(path string, info os.FileInfo, err error) error {
+		if err != nil {
+			return fmt.Errorf("logging: walk %s: %w", path, err)
+		}
+		if info.IsDir() {
+			return nil
+		}
+		if !strings.HasSuffix(info.Name(), ".json") {
+			return nil
+		}
+		if info.ModTime().Before(cutoff) {
+			if removeErr := os.Remove(path); removeErr != nil {
+				return fmt.Errorf("logging: remove old log %s: %w", path, removeErr)
+			}
+		}
+		return nil
+	})
+}
diff --git a/internal/logging/writer.go b/internal/logging/writer.go
new file mode 100644
index 0000000..5a14652
--- /dev/null
+++ b/internal/logging/writer.go
@@ -0,0 +1,65 @@
+package logging
+
+import (
+	"fmt"
+	"os"
+	"path/filepath"
+	"sync"
+	"time"
+)
+
+var nowFunc = time.Now
+
+type dateAwareWriter struct {
+	mu      sync.Mutex
+	dir     string
+	current *os.File
+	today   string
+}
+
+func newDateAwareWriter(dir string) (*dateAwareWriter, error) {
+	if err := os.MkdirAll(dir, 0o755); err != nil {
+		return nil, fmt.Errorf("logging: create dir %s: %w", dir, err)
+	}
+	w := &dateAwareWriter{dir: dir}
+	if err := w.rotate(); err != nil {
+		return nil, err
+	}
+	return w, nil
+}
+
+func (w *dateAwareWriter) rotate() error {
+	today := nowFunc().Format("2006-01-02")
+	if w.current != nil && w.today == today {
+		return nil
+	}
+	if w.current != nil {
+		w.current.Close()
+	}
+	path := filepath.Join(w.dir, today+".json")
+	f, err := os.OpenFile(path, os.O_CREATE|os.O_APPEND|os.O_WRONLY, 0o644)
+	if err != nil {
+		return fmt.Errorf("logging: open %s: %w", path, err)
+	}
+	w.current = f
+	w.today = today
+	return nil
+}
+
+func (w *dateAwareWriter) Write(p []byte) (int, error) {
+	w.mu.Lock()
+	defer w.mu.Unlock()
+	if err := w.rotate(); err != nil {
+		return 0, err
+	}
+	return w.current.Write(p)
+}
+
+func (w *dateAwareWriter) Close() error {
+	w.mu.Lock()
+	defer w.mu.Unlock()
+	if w.current != nil {
+		return w.current.Close()
+	}
+	return nil
+}
diff --git a/internal/model/event.go b/internal/model/event.go
new file mode 100644
index 0000000..5b9258d
--- /dev/null
+++ b/internal/model/event.go
@@ -0,0 +1,47 @@
+package model
+
+import "time"
+
+type EventLevel string
+
+const (
+	LevelInfo     EventLevel = "info"
+	LevelWarning  EventLevel = "warning"
+	LevelError    EventLevel = "error"
+	LevelCritical EventLevel = "critical"
+)
+
+const (
+	EventDaemonStart = "daemon.start"
+	EventDaemonStop  = "daemon.stop"
+	EventDaemonCrash = "daemon.crash"
+
+	EventGroupCreated   = "group.created"
+	EventGroupDeleted   = "group.deleted"
+	EventGroupScaleUp   = "group.scale_up"
+	EventGroupScaleDown = "group.scale_down"
+
+	EventRunnerStarted   = "runner.started"
+	EventRunnerCompleted = "runner.completed"
+	EventRunnerFailed    = "runner.failed"
+	EventRunnerTimeout   = "runner.timeout"
+
+	EventHealthZombieRunner      = "health.zombie_runner"
+	EventHealthRunnerTimeout     = "health.runner_timeout"
+	EventHealthGroupDegraded     = "health.group_degraded"
+	EventHealthGroupDisconnected = "health.group_disconnected"
+	EventHealthGroupFailing      = "health.group_failing"
+	EventHealthDiskLow           = "health.disk_low"
+	EventHealthOrphanKilled      = "health.orphan_killed"
+	EventHealthIdleTimeout       = "health.idle_timeout"
+)
+
+type Event struct {
+	Type      string
+	Level     EventLevel
+	Group     string
+	Runner    string
+	Message   string
+	Details   map[string]string
+	Timestamp time.Time
+}
diff --git a/internal/model/group.go b/internal/model/group.go
new file mode 100644
index 0000000..8e7e366
--- /dev/null
+++ b/internal/model/group.go
@@ -0,0 +1,29 @@
+package model
+
+import "time"
+
+type Group struct {
+	Name        string
+	MaxRunners  int
+	MinRunners  int
+	Labels      []string
+	RunnerGroup string
+}
+
+type RunnerInstance struct {
+	ID      string
+	Name    string
+	Group   string
+	WorkDir string
+	Version string
+}
+
+type RunnerSnapshot struct {
+	Name      string    `json:"name"`
+	Group     string    `json:"group"`
+	State     string    `json:"state"`
+	PID       int       `json:"pid"`
+	StartedAt time.Time `json:"started_at"`
+	JobName   string    `json:"job_name"`
+	JobID     string    `json:"job_id"`
+}
diff --git a/internal/model/health.go b/internal/model/health.go
new file mode 100644
index 0000000..fd10efa
--- /dev/null
+++ b/internal/model/health.go
@@ -0,0 +1,21 @@
+package model
+
+import "time"
+
+type GroupHealthStatus struct {
+	Actual  int
+	Desired int
+	Max     int
+	Min     int
+	Healthy bool
+	Issues  []HealthIssue
+}
+
+type HealthIssue struct {
+	Level      EventLevel `json:"level"`
+	Type       string     `json:"type"`
+	Group      string     `json:"group"`
+	Runner     string     `json:"runner"`
+	Message    string     `json:"message"`
+	DetectedAt time.Time  `json:"detected_at"`
+}
diff --git a/internal/monitoring/uptimekuma.go b/internal/monitoring/uptimekuma.go
new file mode 100644
index 0000000..1803037
--- /dev/null
+++ b/internal/monitoring/uptimekuma.go
@@ -0,0 +1,118 @@
+package monitoring
+
+import (
+	"context"
+	"fmt"
+	"log/slog"
+	"net/http"
+	"net/url"
+	"strings"
+	"time"
+)
+
+type UptimeKumaConfig struct {
+	BaseURL            string
+	DaemonToken        string
+	GroupTokens        map[string]string
+	DegradedThreshold  float64
+	ReportHealthAsPing bool
+}
+
+type UptimeKuma struct {
+	cfg    UptimeKumaConfig
+	client *http.Client
+	logger *slog.Logger
+}
+
+func NewUptimeKuma(cfg UptimeKumaConfig, logger *slog.Logger) *UptimeKuma {
+	return &UptimeKuma{
+		cfg: cfg,
+		client: &http.Client{
+			Timeout: 10 * time.Second,
+		},
+		logger: logger,
+	}
+}
+
+func (u *UptimeKuma) ReportDaemonHealth(ctx context.Context, groups, totalActual, totalDesired int, checkDuration time.Duration) {
+	if u.cfg.DaemonToken == "" {
+		return
+	}
+
+	msg := fmt.Sprintf("groups=%d runners=%d/%d", groups, totalActual, totalDesired)
+	ping := float64(checkDuration.Milliseconds())
+
+	pushErr := u.push(ctx, u.cfg.DaemonToken, "up", msg, ping)
+	if pushErr != nil {
+		u.logger.Warn("uptime-kuma daemon push failed", "error", pushErr)
+	}
+}
+
+func (u *UptimeKuma) ReportGroupHealth(ctx context.Context, group string, actual, desired int) {
+	token, ok := u.cfg.GroupTokens[group]
+	if !ok || token == "" {
+		return
+	}
+
+	status, msg := groupStatus(actual, desired, u.cfg.DegradedThreshold)
+	ping := -1.0
+	if u.cfg.ReportHealthAsPing && desired > 0 {
+		ping = (float64(actual) / float64(desired)) * 100
+	}
+
+	pushErr := u.push(ctx, token, status, msg, ping)
+	if pushErr != nil {
+		u.logger.Warn("uptime-kuma group push failed", "group", group, "error", pushErr)
+	}
+}
+
+func (u *UptimeKuma) push(ctx context.Context, token, status, msg string, ping float64) error {
+	baseURL := strings.TrimRight(u.cfg.BaseURL, "/")
+	pushURL := fmt.Sprintf("%s/api/push/%s?status=%s&msg=%s",
+		baseURL, token, status, url.QueryEscape(truncateMsg(msg, 250)))
+
+	if ping >= 0 {
+		pushURL += fmt.Sprintf("&ping=%.1f", ping)
+	}
+
+	req, err := http.NewRequestWithContext(ctx, http.MethodGet, pushURL, http.NoBody)
+	if err != nil {
+		return fmt.Errorf("create push request: %w", err)
+	}
+
+	resp, err := u.client.Do(req)
+	if err != nil {
+		return fmt.Errorf("push request: %w", err)
+	}
+	defer resp.Body.Close()
+
+	if resp.StatusCode != http.StatusOK {
+		return fmt.Errorf("push failed: HTTP %d", resp.StatusCode)
+	}
+	return nil
+}
+
+func groupStatus(actual, desired int, threshold float64) (status, msg string) {
+	if desired == 0 {
+		return "up", "idle (0 desired)"
+	}
+	if actual == 0 {
+		return "down", fmt.Sprintf("0/%d runners (outage)", desired)
+	}
+
+	ratio := float64(actual) / float64(desired)
+	if ratio < threshold {
+		return "down", fmt.Sprintf("%d/%d runners (critical)", actual, desired)
+	}
+	if actual < desired {
+		return "up", fmt.Sprintf("%d/%d runners (degraded)", actual, desired)
+	}
+	return "up", fmt.Sprintf("%d/%d runners", actual, desired)
+}
+
+func truncateMsg(s string, maxLen int) string {
+	if len(s) <= maxLen {
+		return s
+	}
+	return s[:maxLen]
+}
diff --git a/internal/notification/discord.go b/internal/notification/discord.go
new file mode 100644
index 0000000..38b157e
--- /dev/null
+++ b/internal/notification/discord.go
@@ -0,0 +1,122 @@
+package notification
+
+import (
+	"bytes"
+	"context"
+	"encoding/json"
+	"fmt"
+	"io"
+	"net/http"
+	"strconv"
+	"sync"
+	"time"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/model"
+)
+
+const discordMinInterval = 400 * time.Millisecond
+
+type DiscordConfig struct {
+	WebhookURL string
+	Username   string
+	AvatarURL  string
+	Mentions   DiscordMentions
+}
+
+type DiscordMentions struct {
+	Error    string
+	Critical string
+}
+
+type DiscordProvider struct {
+	cfg      DiscordConfig
+	client   *http.Client
+	mu       sync.Mutex
+	lastSend time.Time
+}
+
+func NewDiscord(cfg *DiscordConfig) *DiscordProvider {
+	return &DiscordProvider{
+		cfg:    *cfg,
+		client: &http.Client{},
+	}
+}
+
+func (d *DiscordProvider) Name() string { return "discord" }
+
+func (d *DiscordProvider) Send(ctx context.Context, event *model.Event) error {
+	d.throttle()
+
+	payload := d.buildPayload(event)
+
+	body, err := json.Marshal(payload)
+	if err != nil {
+		return fmt.Errorf("marshal discord payload: %w", err)
+	}
+
+	resp, err := d.doPost(ctx, body)
+	if err != nil {
+		return err
+	}
+	defer resp.Body.Close()
+
+	if resp.StatusCode == http.StatusTooManyRequests {
+		retryAfter := parseRetryAfter(resp.Header.Get("Retry-After"))
+		_, _ = io.Copy(io.Discard, resp.Body)
+		resp.Body.Close()
+
+		select {
+		case <-ctx.Done():
+			return fmt.Errorf("discord rate limited, context canceled: %w", ctx.Err())
+		case <-time.After(retryAfter):
+		}
+
+		resp, err = d.doPost(ctx, body)
+		if err != nil {
+			return err
+		}
+		defer resp.Body.Close()
+	}
+
+	if resp.StatusCode < 200 || resp.StatusCode >= 300 {
+		return fmt.Errorf("discord webhook returned status %d", resp.StatusCode)
+	}
+
+	return nil
+}
+
+func (d *DiscordProvider) throttle() {
+	d.mu.Lock()
+	defer d.mu.Unlock()
+
+	elapsed := time.Since(d.lastSend)
+	if elapsed < discordMinInterval {
+		time.Sleep(discordMinInterval - elapsed)
+	}
+	d.lastSend = time.Now()
+}
+
+func (d *DiscordProvider) doPost(ctx context.Context, body []byte) (*http.Response, error) {
+	req, err := http.NewRequestWithContext(ctx, http.MethodPost, d.cfg.WebhookURL, bytes.NewReader(body))
+	if err != nil {
+		return nil, fmt.Errorf("create discord request: %w", err)
+	}
+	req.Header.Set("Content-Type", "application/json")
+
+	resp, err := d.client.Do(req)
+	if err != nil {
+		return nil, fmt.Errorf("send discord webhook: %w", err)
+	}
+	return resp, nil
+}
+
+func parseRetryAfter(value string) time.Duration {
+	if value == "" {
+		return time.Second
+	}
+	seconds, err := strconv.ParseFloat(value, 64)
+	if err != nil {
+		return time.Second
+	}
+	return time.Duration(seconds * float64(time.Second))
+}
diff --git a/internal/notification/discord_payload.go b/internal/notification/discord_payload.go
new file mode 100644
index 0000000..496f5d1
--- /dev/null
+++ b/internal/notification/discord_payload.go
@@ -0,0 +1,107 @@
+package notification
+
+import (
+	"sort"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/model"
+)
+
+type discordPayload struct {
+	Username  string         `json:"username,omitempty"`
+	AvatarURL string         `json:"avatar_url,omitempty"`
+	Content   string         `json:"content,omitempty"`
+	Embeds    []discordEmbed `json:"embeds"`
+}
+
+type discordEmbed struct {
+	Title       string         `json:"title"`
+	Description string         `json:"description"`
+	Color       int            `json:"color"`
+	Fields      []discordField `json:"fields,omitempty"`
+	Footer      *discordFooter `json:"footer,omitempty"`
+	Timestamp   string         `json:"timestamp,omitempty"`
+}
+
+type discordField struct {
+	Name   string `json:"name"`
+	Value  string `json:"value"`
+	Inline bool   `json:"inline"`
+}
+
+type discordFooter struct {
+	Text string `json:"text"`
+}
+
+func (d *DiscordProvider) buildPayload(event *model.Event) discordPayload {
+	fields := d.buildFields(event)
+	embed := discordEmbed{
+		Title:       event.Type,
+		Description: event.Message,
+		Color:       colorForLevel(event.Level),
+		Fields:      fields,
+		Footer:      &discordFooter{Text: "ghr"},
+		Timestamp:   event.Timestamp.UTC().Format("2006-01-02T15:04:05Z"),
+	}
+
+	payload := discordPayload{
+		Username:  d.cfg.Username,
+		AvatarURL: d.cfg.AvatarURL,
+		Embeds:    []discordEmbed{embed},
+	}
+
+	mention := d.mentionForLevel(event.Level)
+	if mention != "" {
+		payload.Content = mention
+	}
+
+	return payload
+}
+
+func (d *DiscordProvider) buildFields(event *model.Event) []discordField {
+	var fields []discordField
+
+	if event.Group != "" {
+		fields = append(fields, discordField{Name: "Group", Value: event.Group, Inline: true})
+	}
+	if event.Runner != "" {
+		fields = append(fields, discordField{Name: "Runner", Value: event.Runner, Inline: true})
+	}
+
+	keys := make([]string, 0, len(event.Details))
+	for k := range event.Details {
+		keys = append(keys, k)
+	}
+	sort.Strings(keys)
+
+	for _, k := range keys {
+		fields = append(fields, discordField{Name: k, Value: event.Details[k], Inline: false})
+	}
+
+	return fields
+}
+
+func (d *DiscordProvider) mentionForLevel(level model.EventLevel) string {
+	switch level {
+	case model.LevelError:
+		return d.cfg.Mentions.Error
+	case model.LevelCritical:
+		return d.cfg.Mentions.Critical
+	default:
+		return ""
+	}
+}
+
+func colorForLevel(level model.EventLevel) int {
+	switch level {
+	case model.LevelInfo:
+		return 0x3498DB
+	case model.LevelWarning:
+		return 0xF39C12
+	case model.LevelError:
+		return 0xE74C3C
+	case model.LevelCritical:
+		return 0x992D22
+	default:
+		return 0x3498DB
+	}
+}
diff --git a/internal/notification/discord_test.go b/internal/notification/discord_test.go
new file mode 100644
index 0000000..4572e6b
--- /dev/null
+++ b/internal/notification/discord_test.go
@@ -0,0 +1,245 @@
+package notification
+
+import (
+	"context"
+	"encoding/json"
+	"net/http"
+	"net/http/httptest"
+	"testing"
+	"time"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/model"
+)
+
+func TestDiscordProvider_Name(t *testing.T) {
+	d := NewDiscord(&DiscordConfig{})
+	if d.Name() != "discord" {
+		t.Errorf("Name() = %q, want %q", d.Name(), "discord")
+	}
+}
+
+func TestDiscordProvider_Send(t *testing.T) {
+	baseEvent := model.Event{
+		Type:      "health.zombie_runner",
+		Level:     model.LevelError,
+		Group:     "backend",
+		Runner:    "runner-abc",
+		Message:   "Zombie runner detected",
+		Details:   map[string]string{"pid": "12345", "action": "killed"},
+		Timestamp: time.Date(2025, 1, 15, 14, 30, 0, 0, time.UTC),
+	}
+
+	t.Run("sends valid payload", func(t *testing.T) {
+		var received discordPayload
+
+		srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+			if r.Method != http.MethodPost {
+				t.Errorf("method = %s, want POST", r.Method)
+			}
+			if ct := r.Header.Get("Content-Type"); ct != "application/json" {
+				t.Errorf("Content-Type = %q, want application/json", ct)
+			}
+			if err := json.NewDecoder(r.Body).Decode(&received); err != nil {
+				t.Fatalf("decode body: %v", err)
+			}
+			w.WriteHeader(http.StatusNoContent)
+		}))
+		defer srv.Close()
+
+		d := NewDiscord(&DiscordConfig{
+			WebhookURL: srv.URL,
+			Username:   "ghr-test",
+			Mentions:   DiscordMentions{Error: "<@&123>"},
+		})
+
+		err := d.Send(context.Background(), &baseEvent)
+		if err != nil {
+			t.Fatalf("Send() error = %v", err)
+		}
+
+		if received.Username != "ghr-test" {
+			t.Errorf("username = %q, want %q", received.Username, "ghr-test")
+		}
+		if received.Content != "<@&123>" {
+			t.Errorf("content = %q, want %q", received.Content, "<@&123>")
+		}
+		if len(received.Embeds) != 1 {
+			t.Fatalf("len(embeds) = %d, want 1", len(received.Embeds))
+		}
+		embed := received.Embeds[0]
+		if embed.Title != "health.zombie_runner" {
+			t.Errorf("title = %q, want %q", embed.Title, "health.zombie_runner")
+		}
+		if embed.Description != "Zombie runner detected" {
+			t.Errorf("description = %q, want %q", embed.Description, "Zombie runner detected")
+		}
+		if embed.Color != 0xE74C3C {
+			t.Errorf("color = %d, want %d", embed.Color, 0xE74C3C)
+		}
+		if embed.Timestamp != "2025-01-15T14:30:00Z" {
+			t.Errorf("timestamp = %q, want %q", embed.Timestamp, "2025-01-15T14:30:00Z")
+		}
+	})
+
+	t.Run("includes group and runner fields", func(t *testing.T) {
+		var received discordPayload
+
+		srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+			if err := json.NewDecoder(r.Body).Decode(&received); err != nil {
+				t.Fatalf("decode: %v", err)
+			}
+			w.WriteHeader(http.StatusNoContent)
+		}))
+		defer srv.Close()
+
+		d := NewDiscord(&DiscordConfig{WebhookURL: srv.URL})
+		if err := d.Send(context.Background(), &baseEvent); err != nil {
+			t.Fatalf("Send() error = %v", err)
+		}
+
+		fields := received.Embeds[0].Fields
+		if len(fields) < 2 {
+			t.Fatalf("got %d fields, want at least 2", len(fields))
+		}
+		if fields[0].Name != "Group" || fields[0].Value != "backend" {
+			t.Errorf("field[0] = %v, want Group=backend", fields[0])
+		}
+		if fields[1].Name != "Runner" || fields[1].Value != "runner-abc" {
+			t.Errorf("field[1] = %v, want Runner=runner-abc", fields[1])
+		}
+	})
+
+	t.Run("no mention for info level", func(t *testing.T) {
+		var received discordPayload
+
+		srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+			if err := json.NewDecoder(r.Body).Decode(&received); err != nil {
+				t.Fatalf("decode: %v", err)
+			}
+			w.WriteHeader(http.StatusNoContent)
+		}))
+		defer srv.Close()
+
+		d := NewDiscord(&DiscordConfig{
+			WebhookURL: srv.URL,
+			Mentions:   DiscordMentions{Error: "<@&123>", Critical: "@everyone"},
+		})
+
+		infoEvent := baseEvent
+		infoEvent.Level = model.LevelInfo
+
+		if err := d.Send(context.Background(), &infoEvent); err != nil {
+			t.Fatalf("Send() error = %v", err)
+		}
+
+		if received.Content != "" {
+			t.Errorf("content = %q, want empty for info level", received.Content)
+		}
+	})
+
+	t.Run("critical mention", func(t *testing.T) {
+		var received discordPayload
+
+		srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+			if err := json.NewDecoder(r.Body).Decode(&received); err != nil {
+				t.Fatalf("decode: %v", err)
+			}
+			w.WriteHeader(http.StatusNoContent)
+		}))
+		defer srv.Close()
+
+		d := NewDiscord(&DiscordConfig{
+			WebhookURL: srv.URL,
+			Mentions:   DiscordMentions{Critical: "@everyone"},
+		})
+
+		critEvent := baseEvent
+		critEvent.Level = model.LevelCritical
+
+		if err := d.Send(context.Background(), &critEvent); err != nil {
+			t.Fatalf("Send() error = %v", err)
+		}
+
+		if received.Content != "@everyone" {
+			t.Errorf("content = %q, want @everyone", received.Content)
+		}
+	})
+
+	t.Run("rate limit returns error", func(t *testing.T) {
+		srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+			w.WriteHeader(http.StatusTooManyRequests)
+		}))
+		defer srv.Close()
+
+		d := NewDiscord(&DiscordConfig{WebhookURL: srv.URL})
+		err := d.Send(context.Background(), &baseEvent)
+		if err == nil {
+			t.Fatal("expected error for 429")
+		}
+	})
+
+	t.Run("non-2xx returns error", func(t *testing.T) {
+		srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+			w.WriteHeader(http.StatusInternalServerError)
+		}))
+		defer srv.Close()
+
+		d := NewDiscord(&DiscordConfig{WebhookURL: srv.URL})
+		err := d.Send(context.Background(), &baseEvent)
+		if err == nil {
+			t.Fatal("expected error for 500")
+		}
+	})
+
+	t.Run("empty group and runner omits those fields", func(t *testing.T) {
+		var received discordPayload
+
+		srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+			if err := json.NewDecoder(r.Body).Decode(&received); err != nil {
+				t.Fatalf("decode: %v", err)
+			}
+			w.WriteHeader(http.StatusNoContent)
+		}))
+		defer srv.Close()
+
+		d := NewDiscord(&DiscordConfig{WebhookURL: srv.URL})
+		evt := model.Event{
+			Type:      "daemon.start",
+			Level:     model.LevelInfo,
+			Message:   "started",
+			Timestamp: time.Now(),
+		}
+
+		if err := d.Send(context.Background(), &evt); err != nil {
+			t.Fatalf("Send() error = %v", err)
+		}
+
+		for _, f := range received.Embeds[0].Fields {
+			if f.Name == "Group" || f.Name == "Runner" {
+				t.Errorf("unexpected field %q for empty group/runner", f.Name)
+			}
+		}
+	})
+}
+
+func TestColorForLevel(t *testing.T) {
+	tests := []struct {
+		level model.EventLevel
+		want  int
+	}{
+		{model.LevelInfo, 0x3498DB},
+		{model.LevelWarning, 0xF39C12},
+		{model.LevelError, 0xE74C3C},
+		{model.LevelCritical, 0x992D22},
+		{model.EventLevel("unknown"), 0x3498DB},
+	}
+
+	for _, tt := range tests {
+		t.Run(string(tt.level), func(t *testing.T) {
+			got := colorForLevel(tt.level)
+			if got != tt.want {
+				t.Errorf("colorForLevel(%q) = %d, want %d", tt.level, got, tt.want)
+			}
+		})
+	}
+}
diff --git a/internal/notification/filter.go b/internal/notification/filter.go
new file mode 100644
index 0000000..642781f
--- /dev/null
+++ b/internal/notification/filter.go
@@ -0,0 +1,33 @@
+package notification
+
+import "strings"
+
+type EventFilter struct {
+	Patterns []string
+}
+
+func (f EventFilter) Matches(eventType, level string) bool {
+	if len(f.Patterns) == 0 {
+		return true
+	}
+
+	for _, p := range f.Patterns {
+		if matchesPattern(p, eventType, level) {
+			return true
+		}
+	}
+	return false
+}
+
+func matchesPattern(pattern, eventType, level string) bool {
+	if strings.HasPrefix(pattern, "*:") {
+		return strings.EqualFold(pattern[2:], level)
+	}
+
+	if strings.HasSuffix(pattern, ".*") {
+		prefix := pattern[:len(pattern)-2]
+		return strings.HasPrefix(eventType, prefix+".")
+	}
+
+	return pattern == eventType
+}
diff --git a/internal/notification/filter_test.go b/internal/notification/filter_test.go
new file mode 100644
index 0000000..489e89f
--- /dev/null
+++ b/internal/notification/filter_test.go
@@ -0,0 +1,115 @@
+package notification
+
+import "testing"
+
+func TestEventFilter_Matches(t *testing.T) {
+	tests := []struct {
+		name      string
+		patterns  []string
+		eventType string
+		level     string
+		want      bool
+	}{
+		{
+			name:      "empty patterns matches everything",
+			patterns:  nil,
+			eventType: "daemon.start",
+			level:     "info",
+			want:      true,
+		},
+		{
+			name:      "exact match",
+			patterns:  []string{"daemon.start"},
+			eventType: "daemon.start",
+			level:     "info",
+			want:      true,
+		},
+		{
+			name:      "exact match no match",
+			patterns:  []string{"daemon.stop"},
+			eventType: "daemon.start",
+			level:     "info",
+			want:      false,
+		},
+		{
+			name:      "wildcard matches prefix",
+			patterns:  []string{"health.*"},
+			eventType: "health.zombie_runner",
+			level:     "error",
+			want:      true,
+		},
+		{
+			name:      "wildcard does not match different prefix",
+			patterns:  []string{"health.*"},
+			eventType: "daemon.start",
+			level:     "info",
+			want:      false,
+		},
+		{
+			name:      "wildcard does not match partial prefix",
+			patterns:  []string{"health.*"},
+			eventType: "healthcheck.run",
+			level:     "info",
+			want:      false,
+		},
+		{
+			name:      "level filter matches",
+			patterns:  []string{"*:error"},
+			eventType: "health.zombie_runner",
+			level:     "error",
+			want:      true,
+		},
+		{
+			name:      "level filter does not match different level",
+			patterns:  []string{"*:error"},
+			eventType: "daemon.start",
+			level:     "info",
+			want:      false,
+		},
+		{
+			name:      "level filter case insensitive",
+			patterns:  []string{"*:Error"},
+			eventType: "anything",
+			level:     "error",
+			want:      true,
+		},
+		{
+			name:      "multiple patterns any match succeeds",
+			patterns:  []string{"daemon.start", "health.*", "*:critical"},
+			eventType: "health.disk_low",
+			level:     "warning",
+			want:      true,
+		},
+		{
+			name:      "multiple patterns none match",
+			patterns:  []string{"daemon.start", "runner.failed"},
+			eventType: "health.zombie_runner",
+			level:     "error",
+			want:      false,
+		},
+		{
+			name:      "empty patterns list explicit",
+			patterns:  []string{},
+			eventType: "daemon.start",
+			level:     "info",
+			want:      true,
+		},
+		{
+			name:      "wildcard matches exact prefix dot event",
+			patterns:  []string{"runner.*"},
+			eventType: "runner.started",
+			level:     "info",
+			want:      true,
+		},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			f := EventFilter{Patterns: tt.patterns}
+			got := f.Matches(tt.eventType, tt.level)
+			if got != tt.want {
+				t.Errorf("Matches(%q, %q) = %v, want %v", tt.eventType, tt.level, got, tt.want)
+			}
+		})
+	}
+}
diff --git a/internal/notification/service.go b/internal/notification/service.go
new file mode 100644
index 0000000..40e4f33
--- /dev/null
+++ b/internal/notification/service.go
@@ -0,0 +1,54 @@
+package notification
+
+import (
+	"context"
+	"log/slog"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/model"
+)
+
+type Provider interface {
+	Name() string
+	Send(ctx context.Context, event *model.Event) error
+}
+
+type Service struct {
+	logger    *slog.Logger
+	providers []providerEntry
+}
+
+type providerEntry struct {
+	provider Provider
+	filter   EventFilter
+}
+
+func New(providers []Provider, filters map[string]EventFilter, logger *slog.Logger) *Service {
+	entries := make([]providerEntry, 0, len(providers))
+	for _, p := range providers {
+		f := filters[p.Name()]
+		entries = append(entries, providerEntry{
+			provider: p,
+			filter:   f,
+		})
+	}
+	return &Service{
+		logger:    logger,
+		providers: entries,
+	}
+}
+
+func (s *Service) Notify(ctx context.Context, event *model.Event) {
+	for _, entry := range s.providers {
+		if !entry.filter.Matches(event.Type, string(event.Level)) {
+			continue
+		}
+
+		if err := entry.provider.Send(ctx, event); err != nil {
+			s.logger.Warn("notification send failed",
+				"provider", entry.provider.Name(),
+				"event", event.Type,
+				"error", err,
+			)
+		}
+	}
+}
diff --git a/internal/notification/service_test.go b/internal/notification/service_test.go
new file mode 100644
index 0000000..1c86a84
--- /dev/null
+++ b/internal/notification/service_test.go
@@ -0,0 +1,142 @@
+package notification
+
+import (
+	"context"
+	"errors"
+	"log/slog"
+	"sync"
+	"testing"
+	"time"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/model"
+)
+
+type fakeProvider struct {
+	name   string
+	mu     sync.Mutex
+	events []model.Event
+	err    error
+}
+
+func (f *fakeProvider) Name() string { return f.name }
+
+func (f *fakeProvider) Send(_ context.Context, event *model.Event) error {
+	f.mu.Lock()
+	defer f.mu.Unlock()
+	f.events = append(f.events, *event)
+	return f.err
+}
+
+func (f *fakeProvider) received() []model.Event {
+	f.mu.Lock()
+	defer f.mu.Unlock()
+	cp := make([]model.Event, len(f.events))
+	copy(cp, f.events)
+	return cp
+}
+
+func TestService_Notify(t *testing.T) {
+	event := model.Event{
+		Type:      "daemon.start",
+		Level:     model.LevelInfo,
+		Message:   "started",
+		Timestamp: time.Now(),
+	}
+
+	t.Run("sends to all matching providers", func(t *testing.T) {
+		p1 := &fakeProvider{name: "p1"}
+		p2 := &fakeProvider{name: "p2"}
+
+		svc := New(
+			[]Provider{p1, p2},
+			map[string]EventFilter{},
+			slog.Default(),
+		)
+
+		svc.Notify(context.Background(), &event)
+
+		if len(p1.received()) != 1 {
+			t.Errorf("p1 got %d events, want 1", len(p1.received()))
+		}
+		if len(p2.received()) != 1 {
+			t.Errorf("p2 got %d events, want 1", len(p2.received()))
+		}
+	})
+
+	t.Run("filters events per provider", func(t *testing.T) {
+		p1 := &fakeProvider{name: "p1"}
+		p2 := &fakeProvider{name: "p2"}
+
+		svc := New(
+			[]Provider{p1, p2},
+			map[string]EventFilter{
+				"p1": {Patterns: []string{"daemon.*"}},
+				"p2": {Patterns: []string{"health.*"}},
+			},
+			slog.Default(),
+		)
+
+		svc.Notify(context.Background(), &event)
+
+		if len(p1.received()) != 1 {
+			t.Errorf("p1 got %d events, want 1", len(p1.received()))
+		}
+		if len(p2.received()) != 0 {
+			t.Errorf("p2 got %d events, want 0", len(p2.received()))
+		}
+	})
+
+	t.Run("no filter means all events", func(t *testing.T) {
+		p := &fakeProvider{name: "p1"}
+
+		svc := New(
+			[]Provider{p},
+			map[string]EventFilter{},
+			slog.Default(),
+		)
+
+		svc.Notify(context.Background(), &event)
+
+		if len(p.received()) != 1 {
+			t.Errorf("got %d events, want 1", len(p.received()))
+		}
+	})
+
+	t.Run("provider error is logged not propagated", func(t *testing.T) {
+		p := &fakeProvider{name: "failing", err: errors.New("connection refused")}
+
+		svc := New(
+			[]Provider{p},
+			map[string]EventFilter{},
+			slog.Default(),
+		)
+
+		svc.Notify(context.Background(), &event)
+
+		if len(p.received()) != 1 {
+			t.Errorf("got %d events, want 1", len(p.received()))
+		}
+	})
+
+	t.Run("continues to next provider after error", func(t *testing.T) {
+		p1 := &fakeProvider{name: "fail", err: errors.New("boom")}
+		p2 := &fakeProvider{name: "ok"}
+
+		svc := New(
+			[]Provider{p1, p2},
+			map[string]EventFilter{},
+			slog.Default(),
+		)
+
+		svc.Notify(context.Background(), &event)
+
+		if len(p2.received()) != 1 {
+			t.Errorf("p2 got %d events, want 1", len(p2.received()))
+		}
+	})
+
+	t.Run("no providers does not panic", func(t *testing.T) {
+		svc := New(nil, nil, slog.Default())
+		svc.Notify(context.Background(), &event)
+	})
+}
diff --git a/internal/notification/webhook.go b/internal/notification/webhook.go
new file mode 100644
index 0000000..bf7a735
--- /dev/null
+++ b/internal/notification/webhook.go
@@ -0,0 +1,68 @@
+package notification
+
+import (
+	"bytes"
+	"context"
+	"encoding/json"
+	"fmt"
+	"net/http"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/model"
+)
+
+type WebhookConfig struct {
+	URL     string
+	Method  string
+	Headers map[string]string
+}
+
+type WebhookProvider struct {
+	cfg    WebhookConfig
+	client *http.Client
+}
+
+func NewWebhook(cfg WebhookConfig) *WebhookProvider {
+	method := cfg.Method
+	if method == "" {
+		method = http.MethodPost
+	}
+	return &WebhookProvider{
+		cfg: WebhookConfig{
+			URL:     cfg.URL,
+			Method:  method,
+			Headers: cfg.Headers,
+		},
+		client: &http.Client{},
+	}
+}
+
+func (w *WebhookProvider) Name() string { return "webhook" }
+
+func (w *WebhookProvider) Send(ctx context.Context, event *model.Event) error {
+	body, err := json.Marshal(event)
+	if err != nil {
+		return fmt.Errorf("marshal webhook payload: %w", err)
+	}
+
+	req, err := http.NewRequestWithContext(ctx, w.cfg.Method, w.cfg.URL, bytes.NewReader(body))
+	if err != nil {
+		return fmt.Errorf("create webhook request: %w", err)
+	}
+
+	for k, v := range w.cfg.Headers {
+		req.Header.Set(k, v)
+	}
+	req.Header.Set("Content-Type", "application/json")
+
+	resp, err := w.client.Do(req)
+	if err != nil {
+		return fmt.Errorf("send webhook: %w", err)
+	}
+	defer resp.Body.Close()
+
+	if resp.StatusCode < 200 || resp.StatusCode >= 300 {
+		return fmt.Errorf("webhook returned status %d", resp.StatusCode)
+	}
+
+	return nil
+}
diff --git a/internal/notification/webhook_test.go b/internal/notification/webhook_test.go
new file mode 100644
index 0000000..491f8e3
--- /dev/null
+++ b/internal/notification/webhook_test.go
@@ -0,0 +1,144 @@
+package notification
+
+import (
+	"context"
+	"encoding/json"
+	"net/http"
+	"net/http/httptest"
+	"testing"
+	"time"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/model"
+)
+
+func TestWebhookProvider_Name(t *testing.T) {
+	w := NewWebhook(WebhookConfig{})
+	if w.Name() != "webhook" {
+		t.Errorf("Name() = %q, want %q", w.Name(), "webhook")
+	}
+}
+
+func TestWebhookProvider_Send(t *testing.T) {
+	baseEvent := model.Event{
+		Type:      "runner.started",
+		Level:     model.LevelInfo,
+		Group:     "ci",
+		Runner:    "runner-x1",
+		Message:   "Runner started",
+		Details:   map[string]string{"version": "2.320"},
+		Timestamp: time.Date(2025, 3, 10, 8, 0, 0, 0, time.UTC),
+	}
+
+	t.Run("sends JSON payload with POST", func(t *testing.T) {
+		var received model.Event
+
+		srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+			if r.Method != http.MethodPost {
+				t.Errorf("method = %s, want POST", r.Method)
+			}
+			if ct := r.Header.Get("Content-Type"); ct != "application/json" {
+				t.Errorf("Content-Type = %q, want application/json", ct)
+			}
+			if err := json.NewDecoder(r.Body).Decode(&received); err != nil {
+				t.Fatalf("decode body: %v", err)
+			}
+			w.WriteHeader(http.StatusOK)
+		}))
+		defer srv.Close()
+
+		wp := NewWebhook(WebhookConfig{URL: srv.URL})
+		err := wp.Send(context.Background(), &baseEvent)
+		if err != nil {
+			t.Fatalf("Send() error = %v", err)
+		}
+
+		if received.Type != "runner.started" {
+			t.Errorf("type = %q, want %q", received.Type, "runner.started")
+		}
+		if received.Message != "Runner started" {
+			t.Errorf("message = %q, want %q", received.Message, "Runner started")
+		}
+	})
+
+	t.Run("uses configured method", func(t *testing.T) {
+		var gotMethod string
+
+		srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+			gotMethod = r.Method
+			w.WriteHeader(http.StatusOK)
+		}))
+		defer srv.Close()
+
+		wp := NewWebhook(WebhookConfig{URL: srv.URL, Method: http.MethodPut})
+		if err := wp.Send(context.Background(), &baseEvent); err != nil {
+			t.Fatalf("Send() error = %v", err)
+		}
+
+		if gotMethod != http.MethodPut {
+			t.Errorf("method = %s, want PUT", gotMethod)
+		}
+	})
+
+	t.Run("sets configured headers", func(t *testing.T) {
+		var gotAuth string
+
+		srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+			gotAuth = r.Header.Get("Authorization")
+			w.WriteHeader(http.StatusOK)
+		}))
+		defer srv.Close()
+
+		wp := NewWebhook(WebhookConfig{
+			URL:     srv.URL,
+			Headers: map[string]string{"Authorization": "Bearer tok123"},
+		})
+
+		if err := wp.Send(context.Background(), &baseEvent); err != nil {
+			t.Fatalf("Send() error = %v", err)
+		}
+
+		if gotAuth != "Bearer tok123" {
+			t.Errorf("Authorization = %q, want %q", gotAuth, "Bearer tok123")
+		}
+	})
+
+	t.Run("non-2xx returns error", func(t *testing.T) {
+		srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+			w.WriteHeader(http.StatusForbidden)
+		}))
+		defer srv.Close()
+
+		wp := NewWebhook(WebhookConfig{URL: srv.URL})
+		err := wp.Send(context.Background(), &baseEvent)
+		if err == nil {
+			t.Fatal("expected error for 403")
+		}
+	})
+
+	t.Run("defaults to POST when method empty", func(t *testing.T) {
+		var gotMethod string
+
+		srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+			gotMethod = r.Method
+			w.WriteHeader(http.StatusOK)
+		}))
+		defer srv.Close()
+
+		wp := NewWebhook(WebhookConfig{URL: srv.URL, Method: ""})
+		if err := wp.Send(context.Background(), &baseEvent); err != nil {
+			t.Fatalf("Send() error = %v", err)
+		}
+
+		if gotMethod != http.MethodPost {
+			t.Errorf("method = %s, want POST", gotMethod)
+		}
+	})
+
+	t.Run("connection error returns wrapped error", func(t *testing.T) {
+		wp := NewWebhook(WebhookConfig{URL: "http://127.0.0.1:1"})
+		err := wp.Send(context.Background(), &baseEvent)
+		if err == nil {
+			t.Fatal("expected error for unreachable host")
+		}
+	})
+}
diff --git a/internal/runner/binary.go b/internal/runner/binary.go
new file mode 100644
index 0000000..983b304
--- /dev/null
+++ b/internal/runner/binary.go
@@ -0,0 +1,102 @@
+package runner
+
+import (
+	"context"
+	"encoding/json"
+	"fmt"
+	"log/slog"
+	"net/http"
+	"os"
+	"path/filepath"
+	"runtime"
+	"strings"
+)
+
+type BinaryManager struct {
+	cacheDir   string
+	logger     *slog.Logger
+	httpClient *http.Client
+}
+
+func NewBinaryManager(cacheDir string, logger *slog.Logger) *BinaryManager {
+	return &BinaryManager{
+		cacheDir:   cacheDir,
+		logger:     logger,
+		httpClient: &http.Client{},
+	}
+}
+
+func (m *BinaryManager) EnsureBits(ctx context.Context, version string) (string, error) {
+	resolved := version
+	if resolved == "latest" {
+		v, err := m.resolveLatestVersion(ctx)
+		if err != nil {
+			return "", fmt.Errorf("resolve latest runner version: %w", err)
+		}
+		resolved = v
+		m.logger.InfoContext(ctx, "resolved latest runner version", "version", resolved)
+	}
+
+	destDir := filepath.Join(m.cacheDir, resolved)
+	runShPath := filepath.Join(destDir, "run.sh")
+
+	if _, err := os.Stat(runShPath); err == nil {
+		m.logger.DebugContext(ctx, "runner binary cached", "version", resolved, "path", destDir)
+		return destDir, nil
+	}
+
+	m.logger.InfoContext(ctx, "downloading runner binary", "version", resolved)
+
+	if err := os.MkdirAll(destDir, 0o755); err != nil {
+		return "", fmt.Errorf("create cache dir %s: %w", destDir, err)
+	}
+
+	if err := downloadAndExtract(ctx, m.httpClient, resolved, destDir); err != nil {
+		rmErr := os.RemoveAll(destDir)
+		if rmErr != nil {
+			m.logger.WarnContext(ctx, "failed to clean partial download", "path", destDir, "error", rmErr)
+		}
+		return "", fmt.Errorf("download runner %s: %w", resolved, err)
+	}
+
+	m.logger.InfoContext(ctx, "runner binary ready", "version", resolved, "path", destDir)
+	return destDir, nil
+}
+
+func (m *BinaryManager) resolveLatestVersion(ctx context.Context) (string, error) {
+	req, err := http.NewRequestWithContext(ctx, http.MethodGet, "https://api.github.com/repos/actions/runner/releases/latest", http.NoBody)
+	if err != nil {
+		return "", fmt.Errorf("create request: %w", err)
+	}
+	req.Header.Set("Accept", "application/vnd.github+json")
+
+	resp, err := m.httpClient.Do(req)
+	if err != nil {
+		return "", fmt.Errorf("fetch latest release: %w", err)
+	}
+	defer resp.Body.Close()
+
+	if resp.StatusCode != http.StatusOK {
+		return "", fmt.Errorf("github releases API returned %d", resp.StatusCode)
+	}
+
+	var release struct {
+		TagName string `json:"tag_name"`
+	}
+	if err := json.NewDecoder(resp.Body).Decode(&release); err != nil {
+		return "", fmt.Errorf("decode release response: %w", err)
+	}
+
+	if release.TagName == "" {
+		return "", fmt.Errorf("empty tag_name in release response")
+	}
+
+	return strings.TrimPrefix(release.TagName, "v"), nil
+}
+
+func runnerArch() string {
+	if runtime.GOARCH == "arm64" {
+		return "arm64"
+	}
+	return "x64"
+}
diff --git a/internal/runner/binary_test.go b/internal/runner/binary_test.go
new file mode 100644
index 0000000..c651851
--- /dev/null
+++ b/internal/runner/binary_test.go
@@ -0,0 +1,272 @@
+package runner
+
+import (
+	"archive/tar"
+	"compress/gzip"
+	"context"
+	"encoding/json"
+	"log/slog"
+	"net/http"
+	"net/http/httptest"
+	"os"
+	"path/filepath"
+	"runtime"
+	"testing"
+)
+
+func silentLogger() *slog.Logger {
+	return slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{Level: slog.LevelError + 1}))
+}
+
+func createFakeTarGz(t *testing.T) string {
+	t.Helper()
+
+	tmpFile := filepath.Join(t.TempDir(), "runner.tar.gz")
+	f, err := os.Create(tmpFile)
+	if err != nil {
+		t.Fatalf("create tar.gz file: %v", err)
+	}
+	defer f.Close()
+
+	gw := gzip.NewWriter(f)
+	tw := tar.NewWriter(gw)
+
+	content := []byte("#!/bin/bash\necho hello\n")
+	if err := tw.WriteHeader(&tar.Header{
+		Name:     "run.sh",
+		Mode:     0o755,
+		Size:     int64(len(content)),
+		Typeflag: tar.TypeReg,
+	}); err != nil {
+		t.Fatalf("write tar header: %v", err)
+	}
+	if _, err := tw.Write(content); err != nil {
+		t.Fatalf("write tar content: %v", err)
+	}
+
+	subContent := []byte("config data\n")
+	if err := tw.WriteHeader(&tar.Header{
+		Name:     "config.sh",
+		Mode:     0o755,
+		Size:     int64(len(subContent)),
+		Typeflag: tar.TypeReg,
+	}); err != nil {
+		t.Fatalf("write tar header: %v", err)
+	}
+	if _, err := tw.Write(subContent); err != nil {
+		t.Fatalf("write tar content: %v", err)
+	}
+
+	if err := tw.Close(); err != nil {
+		t.Fatalf("close tar writer: %v", err)
+	}
+	if err := gw.Close(); err != nil {
+		t.Fatalf("close gzip writer: %v", err)
+	}
+
+	return tmpFile
+}
+
+func TestEnsureBits_Cached(t *testing.T) {
+	cacheDir := t.TempDir()
+	version := "2.320.0"
+
+	versionDir := filepath.Join(cacheDir, version)
+	if err := os.MkdirAll(versionDir, 0o755); err != nil {
+		t.Fatalf("create version dir: %v", err)
+	}
+	runSh := filepath.Join(versionDir, "run.sh")
+	if err := os.WriteFile(runSh, []byte("#!/bin/bash\n"), 0o755); err != nil {
+		t.Fatalf("write run.sh: %v", err)
+	}
+
+	bm := NewBinaryManager(cacheDir, silentLogger())
+
+	got, err := bm.EnsureBits(context.Background(), version)
+	if err != nil {
+		t.Fatalf("EnsureBits: %v", err)
+	}
+
+	if got != versionDir {
+		t.Fatalf("expected path %q, got %q", versionDir, got)
+	}
+}
+
+func TestEnsureBits_Download(t *testing.T) {
+	tarGzPath := createFakeTarGz(t)
+	tarGzData, err := os.ReadFile(tarGzPath)
+	if err != nil {
+		t.Fatalf("read tar.gz: %v", err)
+	}
+
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		w.Header().Set("Content-Type", "application/gzip")
+		if _, writeErr := w.Write(tarGzData); writeErr != nil {
+			t.Errorf("write response: %v", writeErr)
+		}
+	}))
+	defer srv.Close()
+
+	cacheDir := t.TempDir()
+	version := "2.320.0"
+	versionDir := filepath.Join(cacheDir, version)
+
+	if err := os.MkdirAll(versionDir, 0o755); err != nil {
+		t.Fatalf("create version dir: %v", err)
+	}
+
+	req, err := http.NewRequestWithContext(context.Background(), http.MethodGet, srv.URL+"/runner.tar.gz", nil)
+	if err != nil {
+		t.Fatalf("create request: %v", err)
+	}
+	resp, err := srv.Client().Do(req)
+	if err != nil {
+		t.Fatalf("do request: %v", err)
+	}
+	defer resp.Body.Close()
+
+	if err := extractTarGz(resp.Body, versionDir); err != nil {
+		t.Fatalf("extract tar.gz: %v", err)
+	}
+
+	runSh := filepath.Join(versionDir, "run.sh")
+	if _, statErr := os.Stat(runSh); statErr != nil {
+		t.Fatalf("run.sh not found after extraction: %v", statErr)
+	}
+
+	configSh := filepath.Join(versionDir, "config.sh")
+	if _, statErr := os.Stat(configSh); statErr != nil {
+		t.Fatalf("config.sh not found after extraction: %v", statErr)
+	}
+
+	bm := NewBinaryManager(cacheDir, silentLogger())
+	got, err := bm.EnsureBits(context.Background(), version)
+	if err != nil {
+		t.Fatalf("EnsureBits on extracted dir: %v", err)
+	}
+	if got != versionDir {
+		t.Fatalf("expected path %q, got %q", versionDir, got)
+	}
+}
+
+func TestRunnerArch(t *testing.T) {
+	got := runnerArch()
+	switch runtime.GOARCH {
+	case "arm64":
+		if got != "arm64" {
+			t.Fatalf("expected arm64, got %q", got)
+		}
+	default:
+		if got != "x64" {
+			t.Fatalf("expected x64, got %q", got)
+		}
+	}
+}
+
+func TestResolveLatestVersion(t *testing.T) {
+	tests := []struct {
+		name       string
+		response   any
+		statusCode int
+		wantVer    string
+		wantErr    bool
+	}{
+		{
+			name:       "valid release with v prefix",
+			response:   map[string]string{"tag_name": "v2.320.0"},
+			statusCode: http.StatusOK,
+			wantVer:    "2.320.0",
+		},
+		{
+			name:       "valid release without v prefix",
+			response:   map[string]string{"tag_name": "2.321.0"},
+			statusCode: http.StatusOK,
+			wantVer:    "2.321.0",
+		},
+		{
+			name:       "empty tag_name",
+			response:   map[string]string{"tag_name": ""},
+			statusCode: http.StatusOK,
+			wantErr:    true,
+		},
+		{
+			name:       "api error",
+			response:   nil,
+			statusCode: http.StatusInternalServerError,
+			wantErr:    true,
+		},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+				w.WriteHeader(tt.statusCode)
+				if tt.response != nil {
+					data, jsonErr := json.Marshal(tt.response)
+					if jsonErr != nil {
+						t.Fatalf("marshal response: %v", jsonErr)
+					}
+					if _, writeErr := w.Write(data); writeErr != nil {
+						t.Errorf("write response: %v", writeErr)
+					}
+				}
+			}))
+			defer srv.Close()
+
+			bm := &BinaryManager{
+				cacheDir:   t.TempDir(),
+				logger:     silentLogger(),
+				httpClient: srv.Client(),
+			}
+
+			ctx := context.Background()
+			req, err := http.NewRequestWithContext(ctx, http.MethodGet, srv.URL, nil)
+			if err != nil {
+				t.Fatalf("create request: %v", err)
+			}
+			req.Header.Set("Accept", "application/vnd.github+json")
+
+			resp, err := bm.httpClient.Do(req)
+			if err != nil {
+				t.Fatalf("do request: %v", err)
+			}
+			defer resp.Body.Close()
+
+			if resp.StatusCode != http.StatusOK {
+				if !tt.wantErr {
+					t.Fatalf("unexpected status %d", resp.StatusCode)
+				}
+				return
+			}
+
+			var release struct {
+				TagName string `json:"tag_name"`
+			}
+			if decodeErr := json.NewDecoder(resp.Body).Decode(&release); decodeErr != nil {
+				if !tt.wantErr {
+					t.Fatalf("decode: %v", decodeErr)
+				}
+				return
+			}
+
+			if release.TagName == "" {
+				if !tt.wantErr {
+					t.Fatal("empty tag_name")
+				}
+				return
+			}
+
+			got := release.TagName
+			if len(got) > 0 && got[0] == 'v' {
+				got = got[1:]
+			}
+
+			if tt.wantErr {
+				t.Fatal("expected error but got none")
+			}
+			if got != tt.wantVer {
+				t.Fatalf("expected version %q, got %q", tt.wantVer, got)
+			}
+		})
+	}
+}
diff --git a/internal/runner/cleanup.go b/internal/runner/cleanup.go
new file mode 100644
index 0000000..8135a5a
--- /dev/null
+++ b/internal/runner/cleanup.go
@@ -0,0 +1,109 @@
+package runner
+
+import (
+	"context"
+	"fmt"
+	"os"
+	"os/exec"
+	"path/filepath"
+	"strconv"
+	"strings"
+	"syscall"
+)
+
+func (m *ProcessManager) CleanupStale(ctx context.Context) error {
+	entries, err := os.ReadDir(m.workdirBase)
+	if err != nil {
+		if os.IsNotExist(err) {
+			return nil
+		}
+		return fmt.Errorf("read workdir base %s: %w", m.workdirBase, err)
+	}
+
+	for _, groupEntry := range entries {
+		if !groupEntry.IsDir() {
+			continue
+		}
+		if err := m.cleanupStaleGroup(ctx, groupEntry.Name()); err != nil {
+			m.logger.WarnContext(ctx, "failed to cleanup stale group", "group", groupEntry.Name(), "error", err)
+		}
+	}
+
+	return nil
+}
+
+func (m *ProcessManager) cleanupStaleGroup(ctx context.Context, group string) error {
+	groupDir := filepath.Join(m.workdirBase, group)
+	entries, err := os.ReadDir(groupDir)
+	if err != nil {
+		return fmt.Errorf("read group dir %s: %w", groupDir, err)
+	}
+
+	for _, runnerEntry := range entries {
+		if !runnerEntry.IsDir() {
+			continue
+		}
+		m.cleanupStaleRunner(ctx, group, runnerEntry.Name())
+	}
+
+	return nil
+}
+
+func (m *ProcessManager) cleanupStaleRunner(ctx context.Context, group, runner string) {
+	runnerDir := filepath.Join(m.workdirBase, group, runner)
+	pidFile := filepath.Join(runnerDir, ".ghr-pid")
+
+	pidBytes, err := os.ReadFile(pidFile)
+	if err != nil {
+		m.logger.DebugContext(ctx, "no PID file found, removing stale workdir", "dir", runnerDir)
+		removeErr := os.RemoveAll(runnerDir)
+		if removeErr != nil {
+			m.logger.WarnContext(ctx, "failed to remove stale workdir", "dir", runnerDir, "error", removeErr)
+		}
+		return
+	}
+
+	pid, err := strconv.Atoi(strings.TrimSpace(string(pidBytes)))
+	if err != nil {
+		m.logger.WarnContext(ctx, "invalid PID file content, removing workdir", "dir", runnerDir, "error", err)
+		removeErr := os.RemoveAll(runnerDir)
+		if removeErr != nil {
+			m.logger.WarnContext(ctx, "failed to remove stale workdir", "dir", runnerDir, "error", removeErr)
+		}
+		return
+	}
+
+	if isProcessAlive(pid) {
+		m.logger.WarnContext(ctx, "killing stale runner process", "pid", pid, "runner", runner, "group", group)
+		killErr := syscall.Kill(pid, syscall.SIGKILL)
+		if killErr != nil {
+			m.logger.WarnContext(ctx, "failed to kill stale process", "pid", pid, "error", killErr)
+		}
+	}
+
+	removeErr := os.RemoveAll(runnerDir)
+	if removeErr != nil {
+		m.logger.WarnContext(ctx, "failed to remove stale workdir", "dir", runnerDir, "error", removeErr)
+	} else {
+		m.logger.InfoContext(ctx, "cleaned up stale runner", "runner", runner, "group", group, "pid", pid)
+	}
+}
+
+func (m *ProcessManager) KillOrphanRunners(ctx context.Context) {
+	out, err := exec.CommandContext(ctx, "pgrep", "-f", m.workdirBase).Output()
+	if err != nil {
+		return
+	}
+	for _, line := range strings.Split(strings.TrimSpace(string(out)), "\n") {
+		pid, err := strconv.Atoi(strings.TrimSpace(line))
+		if err != nil || pid <= 0 {
+			continue
+		}
+		m.logger.WarnContext(ctx, "killing orphan runner process", "pid", pid)
+		_ = syscall.Kill(pid, syscall.SIGKILL)
+	}
+}
+
+func isProcessAlive(pid int) bool {
+	return syscall.Kill(pid, 0) == nil
+}
diff --git a/internal/runner/cleanup_test.go b/internal/runner/cleanup_test.go
new file mode 100644
index 0000000..b1cf252
--- /dev/null
+++ b/internal/runner/cleanup_test.go
@@ -0,0 +1,67 @@
+package runner
+
+import (
+	"context"
+	"os"
+	"path/filepath"
+	"testing"
+)
+
+func TestCleanupStale_DeadProcess(t *testing.T) {
+	workdirBase := t.TempDir()
+
+	groupDir := filepath.Join(workdirBase, "group-a")
+	runnerDir := filepath.Join(groupDir, "runner-1")
+	if err := os.MkdirAll(runnerDir, 0o755); err != nil {
+		t.Fatalf("create runner dir: %v", err)
+	}
+
+	pidFile := filepath.Join(runnerDir, ".ghr-pid")
+	if err := os.WriteFile(pidFile, []byte("9999999"), 0o644); err != nil {
+		t.Fatalf("write PID file: %v", err)
+	}
+
+	pm := NewProcessManager(workdirBase, silentLogger())
+	if err := pm.CleanupStale(context.Background()); err != nil {
+		t.Fatalf("CleanupStale: %v", err)
+	}
+
+	if _, err := os.Stat(runnerDir); !os.IsNotExist(err) {
+		t.Fatalf("expected runner dir to be removed, stat returned: %v", err)
+	}
+}
+
+func TestCleanupStale_EmptyDir(t *testing.T) {
+	workdirBase := t.TempDir()
+
+	pm := NewProcessManager(workdirBase, silentLogger())
+	if err := pm.CleanupStale(context.Background()); err != nil {
+		t.Fatalf("CleanupStale on empty dir: %v", err)
+	}
+}
+
+func TestCleanupStale_NonexistentDir(t *testing.T) {
+	pm := NewProcessManager("/nonexistent/path/that/does/not/exist", silentLogger())
+	if err := pm.CleanupStale(context.Background()); err != nil {
+		t.Fatalf("CleanupStale on nonexistent dir: %v", err)
+	}
+}
+
+func TestCleanupStale_NoPidFile(t *testing.T) {
+	workdirBase := t.TempDir()
+
+	groupDir := filepath.Join(workdirBase, "group-b")
+	runnerDir := filepath.Join(groupDir, "runner-orphan")
+	if err := os.MkdirAll(runnerDir, 0o755); err != nil {
+		t.Fatalf("create runner dir: %v", err)
+	}
+
+	pm := NewProcessManager(workdirBase, silentLogger())
+	if err := pm.CleanupStale(context.Background()); err != nil {
+		t.Fatalf("CleanupStale: %v", err)
+	}
+
+	if _, err := os.Stat(runnerDir); !os.IsNotExist(err) {
+		t.Fatalf("expected runner dir without PID file to be removed, stat returned: %v", err)
+	}
+}
diff --git a/internal/runner/copy.go b/internal/runner/copy.go
new file mode 100644
index 0000000..cef7538
--- /dev/null
+++ b/internal/runner/copy.go
@@ -0,0 +1,57 @@
+package runner
+
+import (
+	"fmt"
+	"io"
+	"os"
+	"path/filepath"
+)
+
+func copyDir(src, dst string) error {
+	return filepath.Walk(src, func(path string, info os.FileInfo, err error) error {
+		if err != nil {
+			return fmt.Errorf("walk source %s: %w", path, err)
+		}
+
+		relPath, err := filepath.Rel(src, path)
+		if err != nil {
+			return fmt.Errorf("compute relative path for %s: %w", path, err)
+		}
+
+		targetPath := filepath.Join(dst, relPath)
+
+		if info.IsDir() {
+			return os.MkdirAll(targetPath, info.Mode())
+		}
+
+		if info.Mode()&os.ModeSymlink != 0 {
+			link, err := os.Readlink(path)
+			if err != nil {
+				return fmt.Errorf("read symlink %s: %w", path, err)
+			}
+			return os.Symlink(link, targetPath)
+		}
+
+		return copyFile(path, targetPath, info.Mode())
+	})
+}
+
+func copyFile(src, dst string, mode os.FileMode) error {
+	in, err := os.Open(src)
+	if err != nil {
+		return fmt.Errorf("open source %s: %w", src, err)
+	}
+	defer in.Close()
+
+	out, err := os.OpenFile(dst, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, mode)
+	if err != nil {
+		return fmt.Errorf("create dest %s: %w", dst, err)
+	}
+	defer out.Close()
+
+	if _, err := io.Copy(out, in); err != nil {
+		return fmt.Errorf("copy %s to %s: %w", src, dst, err)
+	}
+
+	return nil
+}
diff --git a/internal/runner/download.go b/internal/runner/download.go
new file mode 100644
index 0000000..0e66ba9
--- /dev/null
+++ b/internal/runner/download.go
@@ -0,0 +1,109 @@
+package runner
+
+import (
+	"archive/tar"
+	"compress/gzip"
+	"context"
+	"errors"
+	"fmt"
+	"io"
+	"net/http"
+	"os"
+	"path/filepath"
+	"strings"
+)
+
+const downloadURLTemplate = "https://github.com/actions/runner/releases/download/v%s/actions-runner-osx-%s-%s.tar.gz"
+
+func downloadAndExtract(ctx context.Context, client *http.Client, version, destDir string) error {
+	url := fmt.Sprintf(downloadURLTemplate, version, runnerArch(), version)
+
+	req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, http.NoBody)
+	if err != nil {
+		return fmt.Errorf("create download request: %w", err)
+	}
+
+	resp, err := client.Do(req)
+	if err != nil {
+		return fmt.Errorf("download tarball: %w", err)
+	}
+	defer resp.Body.Close()
+
+	if resp.StatusCode != http.StatusOK {
+		return fmt.Errorf("download returned HTTP %d for %s", resp.StatusCode, url)
+	}
+
+	return extractTarGz(resp.Body, destDir)
+}
+
+func extractTarGz(r io.Reader, destDir string) error {
+	gz, err := gzip.NewReader(r)
+	if err != nil {
+		return fmt.Errorf("open gzip reader: %w", err)
+	}
+	defer gz.Close()
+
+	tr := tar.NewReader(gz)
+	for {
+		header, err := tr.Next()
+		if errors.Is(err, io.EOF) {
+			break
+		}
+		if err != nil {
+			return fmt.Errorf("read tar entry: %w", err)
+		}
+
+		target, err := sanitizeTarPath(destDir, header.Name)
+		if err != nil {
+			return err
+		}
+
+		switch header.Typeflag {
+		case tar.TypeDir:
+			if err := os.MkdirAll(target, os.FileMode(header.Mode)); err != nil {
+				return fmt.Errorf("create directory %s: %w", target, err)
+			}
+		case tar.TypeReg:
+			if err := extractFile(tr, target, os.FileMode(header.Mode)); err != nil {
+				return err
+			}
+		case tar.TypeSymlink:
+			linkTarget, linkErr := sanitizeTarPath(destDir, header.Linkname)
+			if linkErr != nil {
+				linkTarget = header.Linkname
+			}
+			if err := os.Symlink(linkTarget, target); err != nil {
+				return fmt.Errorf("create symlink %s: %w", target, err)
+			}
+		}
+	}
+
+	return nil
+}
+
+func extractFile(r io.Reader, path string, mode os.FileMode) error {
+	dir := filepath.Dir(path)
+	if err := os.MkdirAll(dir, 0o755); err != nil {
+		return fmt.Errorf("create parent dir for %s: %w", path, err)
+	}
+
+	f, err := os.OpenFile(path, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, mode)
+	if err != nil {
+		return fmt.Errorf("create file %s: %w", path, err)
+	}
+	defer f.Close()
+
+	if _, err := io.Copy(f, r); err != nil {
+		return fmt.Errorf("write file %s: %w", path, err)
+	}
+
+	return nil
+}
+
+func sanitizeTarPath(destDir, name string) (string, error) {
+	target := filepath.Join(destDir, filepath.Clean(name))
+	if !strings.HasPrefix(target, filepath.Clean(destDir)+string(os.PathSeparator)) && target != filepath.Clean(destDir) {
+		return "", fmt.Errorf("tar entry %q escapes destination directory", name)
+	}
+	return target, nil
+}
diff --git a/internal/runner/process.go b/internal/runner/process.go
new file mode 100644
index 0000000..60b6451
--- /dev/null
+++ b/internal/runner/process.go
@@ -0,0 +1,137 @@
+package runner
+
+import (
+	"context"
+	"errors"
+	"fmt"
+	"io"
+	"log/slog"
+	"os"
+	"os/exec"
+	"path/filepath"
+	"strconv"
+	"syscall"
+	"time"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/model"
+)
+
+const stopGracePeriod = 10 * time.Second
+
+type Process struct {
+	Name      string
+	Group     string
+	WorkDir   string
+	PID       int
+	StartedAt time.Time
+	Cmd       *exec.Cmd
+}
+
+type ProcessManager struct {
+	workdirBase string
+	logger      *slog.Logger
+}
+
+func NewProcessManager(workdirBase string, logger *slog.Logger) *ProcessManager {
+	return &ProcessManager{
+		workdirBase: workdirBase,
+		logger:      logger,
+	}
+}
+
+func (m *ProcessManager) Prepare(ctx context.Context, instance *model.RunnerInstance, cachedDir string) (string, error) {
+	workdir := filepath.Join(m.workdirBase, instance.Group, instance.Name)
+
+	if err := os.MkdirAll(workdir, 0o755); err != nil {
+		return "", fmt.Errorf("create workdir %s: %w", workdir, err)
+	}
+
+	if err := copyDir(cachedDir, workdir); err != nil {
+		return "", fmt.Errorf("copy runner bits to %s: %w", workdir, err)
+	}
+
+	m.logger.DebugContext(ctx, "prepared runner workdir", "workdir", workdir, "runner", instance.Name)
+	return workdir, nil
+}
+
+func (m *ProcessManager) Start(ctx context.Context, instance *model.RunnerInstance, workdir, jitConfig string, logFile io.Writer) (*Process, error) {
+	runScript := filepath.Join(workdir, "run.sh")
+	cmd := exec.CommandContext(ctx, runScript)
+	cmd.Dir = workdir
+	cmd.Env = append(os.Environ(), "ACTIONS_RUNNER_INPUT_JITCONFIG="+jitConfig)
+	cmd.Stdout = logFile
+	cmd.Stderr = logFile
+
+	if err := cmd.Start(); err != nil {
+		return nil, fmt.Errorf("start runner %s: %w", instance.Name, err)
+	}
+
+	pidFile := filepath.Join(workdir, ".ghr-pid")
+	if err := os.WriteFile(pidFile, []byte(strconv.Itoa(cmd.Process.Pid)), 0o644); err != nil {
+		m.logger.WarnContext(ctx, "failed to write PID file", "path", pidFile, "error", err)
+	}
+
+	m.logger.InfoContext(ctx, "runner started", "runner", instance.Name, "pid", cmd.Process.Pid)
+
+	return &Process{
+		Name:      instance.Name,
+		Group:     instance.Group,
+		WorkDir:   workdir,
+		PID:       cmd.Process.Pid,
+		StartedAt: time.Now(),
+		Cmd:       cmd,
+	}, nil
+}
+
+func (m *ProcessManager) Stop(ctx context.Context, proc *Process) error {
+	if proc.Cmd == nil || proc.Cmd.Process == nil {
+		return nil
+	}
+
+	m.logger.InfoContext(ctx, "stopping runner", "runner", proc.Name, "pid", proc.PID)
+
+	if err := proc.Cmd.Process.Signal(syscall.SIGTERM); err != nil {
+		if isProcessFinished(err) {
+			return nil
+		}
+		return fmt.Errorf("send SIGTERM to runner %s (pid %d): %w", proc.Name, proc.PID, err)
+	}
+
+	done := make(chan error, 1)
+	go func() {
+		done <- proc.Cmd.Wait()
+	}()
+
+	select {
+	case err := <-done:
+		if isExpectedExit(err) {
+			return nil
+		}
+		return err
+	case <-time.After(stopGracePeriod):
+		m.logger.WarnContext(ctx, "runner did not exit after SIGTERM, sending SIGKILL", "runner", proc.Name, "pid", proc.PID)
+		if err := proc.Cmd.Process.Kill(); err != nil {
+			return fmt.Errorf("kill runner %s (pid %d): %w", proc.Name, proc.PID, err)
+		}
+		return <-done
+	}
+}
+
+func isProcessFinished(err error) bool {
+	return errors.Is(err, os.ErrProcessDone)
+}
+
+func isExpectedExit(err error) bool {
+	if err == nil {
+		return true
+	}
+	var exitErr *exec.ExitError
+	return errors.As(err, &exitErr)
+}
+
+func (m *ProcessManager) Cleanup(proc *Process) error {
+	if err := os.RemoveAll(proc.WorkDir); err != nil {
+		return fmt.Errorf("remove workdir %s: %w", proc.WorkDir, err)
+	}
+	return nil
+}
diff --git a/internal/runner/process_test.go b/internal/runner/process_test.go
new file mode 100644
index 0000000..dc457d1
--- /dev/null
+++ b/internal/runner/process_test.go
@@ -0,0 +1,77 @@
+package runner
+
+import (
+	"context"
+	"os"
+	"path/filepath"
+	"testing"
+
+	"github.com/RedBoardDev/gh-runners-tool/v2/internal/model"
+)
+
+func TestPrepare(t *testing.T) {
+	workdirBase := t.TempDir()
+	cachedDir := t.TempDir()
+
+	files := map[string]string{
+		"run.sh":    "#!/bin/bash\necho run\n",
+		"config.sh": "#!/bin/bash\necho config\n",
+	}
+	for name, content := range files {
+		if err := os.WriteFile(filepath.Join(cachedDir, name), []byte(content), 0o755); err != nil {
+			t.Fatalf("write %s: %v", name, err)
+		}
+	}
+
+	pm := NewProcessManager(workdirBase, silentLogger())
+	instance := model.RunnerInstance{
+		ID:    "abc123",
+		Name:  "test-group-abc123",
+		Group: "test-group",
+	}
+
+	workdir, err := pm.Prepare(context.Background(), &instance, cachedDir)
+	if err != nil {
+		t.Fatalf("Prepare: %v", err)
+	}
+
+	expectedDir := filepath.Join(workdirBase, "test-group", "test-group-abc123")
+	if workdir != expectedDir {
+		t.Fatalf("expected workdir %q, got %q", expectedDir, workdir)
+	}
+
+	for name, content := range files {
+		p := filepath.Join(workdir, name)
+		data, readErr := os.ReadFile(p)
+		if readErr != nil {
+			t.Fatalf("read copied file %s: %v", name, readErr)
+		}
+		if string(data) != content {
+			t.Fatalf("file %s content mismatch: got %q, want %q", name, string(data), content)
+		}
+	}
+}
+
+func TestCleanup(t *testing.T) {
+	workdir := t.TempDir()
+	sentinel := filepath.Join(workdir, "run.sh")
+	if err := os.WriteFile(sentinel, []byte("#!/bin/bash\n"), 0o755); err != nil {
+		t.Fatalf("write sentinel: %v", err)
+	}
+
+	proc := &Process{
+		Name:    "test-runner",
+		Group:   "test-group",
+		WorkDir: workdir,
+		PID:     99999,
+	}
+
+	pm := NewProcessManager(filepath.Dir(workdir), silentLogger())
+	if err := pm.Cleanup(proc); err != nil {
+		t.Fatalf("Cleanup: %v", err)
+	}
+
+	if _, err := os.Stat(workdir); !os.IsNotExist(err) {
+		t.Fatalf("expected workdir to be removed, stat returned: %v", err)
+	}
+}
diff --git a/old-version/config.example.yaml b/old-version/config.example.yaml
deleted file mode 100644
index ad673b6..0000000
--- a/old-version/config.example.yaml
+++ /dev/null
@@ -1,20 +0,0 @@
-github:
-  scope: org # or repo
-  owner: your-org
-  # repo: your-repo # required when scope=repo
-
-defaults:
-  workdir_base: /var/lib/ghr/groups
-  cache_dir: /var/lib/ghr/cache
-  version: latest
-
-groups:
-  - name: deploy-api
-    count: 10
-    ephemeral: true
-    labels: [deploy, macos]
-  - name: ci-default
-    count: 5
-    ephemeral: false
-    labels: [ci, macos]
-
diff --git a/old-version/env.example b/old-version/env.example
deleted file mode 100644
index 73723dd..0000000
--- a/old-version/env.example
+++ /dev/null
@@ -1,2 +0,0 @@
-GITHUB_TOKEN=YOUR_GITHUB_PAT_WITH_RUNNER_PERMS
-
diff --git a/old-version/go.mod b/old-version/go.mod
deleted file mode 100644
index 6bc6cd4..0000000
--- a/old-version/go.mod
+++ /dev/null
@@ -1,14 +0,0 @@
-module gh-runners-tool
-
-go 1.24.4
-
-require (
-	github.com/joho/godotenv v1.5.1
-	github.com/spf13/cobra v1.8.0
-	gopkg.in/yaml.v3 v3.0.1
-)
-
-require (
-	github.com/inconshreveable/mousetrap v1.1.0 // indirect
-	github.com/spf13/pflag v1.0.5 // indirect
-)
diff --git a/old-version/go.sum b/old-version/go.sum
deleted file mode 100644
index f2fd08d..0000000
--- a/old-version/go.sum
+++ /dev/null
@@ -1,14 +0,0 @@
-github.com/cpuguy83/go-md2man/v2 v2.0.3/go.mod h1:tgQtvFlXSQOSOSIRvRPT7W67SCa46tRHOmNcaadrF8o=
-github.com/inconshreveable/mousetrap v1.1.0 h1:wN+x4NVGpMsO7ErUn/mUI3vEoE6Jt13X2s0bqwp9tc8=
-github.com/inconshreveable/mousetrap v1.1.0/go.mod h1:vpF70FUmC8bwa3OWnCshd2FqLfsEA9PFc4w1p2J65bw=
-github.com/joho/godotenv v1.5.1 h1:7eLL/+HRGLY0ldzfGMeQkb7vMd0as4CfYvUVzLqw0N0=
-github.com/joho/godotenv v1.5.1/go.mod h1:f4LDr5Voq0i2e/R5DDNOoa2zzDfwtkZa6DnEwAbqwq4=
-github.com/russross/blackfriday/v2 v2.1.0/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQDYRxCVz55jmeOWTM=
-github.com/spf13/cobra v1.8.0 h1:7aJaZx1B85qltLMc546zn58BxxfZdR/W22ej9CFoEf0=
-github.com/spf13/cobra v1.8.0/go.mod h1:WXLWApfZ71AjXPya3WOlMsY9yMs7YeiHhFVlvLyhcho=
-github.com/spf13/pflag v1.0.5 h1:iy+VFUOCP1a+8yFto/drg2CJ5u0yRoB7fZw3DKv/JXA=
-github.com/spf13/pflag v1.0.5/go.mod h1:McXfInJRrz4CZXVZOBLb0bTZqETkiAhM9Iw0y3An2Bg=
-gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM=
-gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
-gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
-gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
diff --git a/old-version/internal/cli/apply.go b/old-version/internal/cli/apply.go
deleted file mode 100644
index 516392a..0000000
--- a/old-version/internal/cli/apply.go
+++ /dev/null
@@ -1,38 +0,0 @@
-package cli
-
-import (
-	"fmt"
-	"os"
-	"strconv"
-	"syscall"
-
-	"gh-runners-tool/internal/config"
-	"github.com/spf13/cobra"
-)
-
-func applyCmd() *cobra.Command {
-	cmd := &cobra.Command{
-		Use:   "apply",
-		Short: "Validate config and signal daemon to reload",
-		RunE: func(cmd *cobra.Command, args []string) error {
-			if _, err := config.Load(configPath); err != nil {
-				return fmt.Errorf("load config %s: %w", configPath, err)
-			}
-
-			pidBytes, err := os.ReadFile(pidFilePath())
-			if err != nil {
-				return fmt.Errorf("read daemon pid from %s: %w", pidFilePath(), err)
-			}
-			pid, err := strconv.Atoi(string(pidBytes))
-			if err != nil {
-				return fmt.Errorf("invalid pid file: %w", err)
-			}
-			if err := syscall.Kill(pid, syscall.SIGHUP); err != nil {
-				return fmt.Errorf("signal daemon: %w", err)
-			}
-			cmd.Println("reload signal sent to daemon")
-			return nil
-		},
-	}
-	return cmd
-}
diff --git a/old-version/internal/cli/daemon.go b/old-version/internal/cli/daemon.go
deleted file mode 100644
index 854edad..0000000
--- a/old-version/internal/cli/daemon.go
+++ /dev/null
@@ -1,118 +0,0 @@
-package cli
-
-import (
-	"context"
-	"fmt"
-	"log"
-	"os"
-	"os/signal"
-	"strings"
-	"syscall"
-	"time"
-
-	"gh-runners-tool/internal/config"
-	"gh-runners-tool/internal/logging"
-	"gh-runners-tool/internal/provider/github"
-	"gh-runners-tool/internal/reconciler"
-	"gh-runners-tool/internal/runner"
-	"github.com/spf13/cobra"
-)
-
-func daemonCmd() *cobra.Command {
-	cmd := &cobra.Command{
-		Use:   "daemon",
-		Short: "Run the controller daemon",
-		RunE:  runDaemon,
-	}
-	return cmd
-}
-
-func runDaemon(cmd *cobra.Command, _ []string) error {
-	logger := logging.New()
-
-	cfg, err := config.Load(configPath)
-	if err != nil {
-		return err
-	}
-	token := os.Getenv("GITHUB_TOKEN")
-	if token == "" {
-		token = os.Getenv("GITHUB_PAT")
-	}
-	if token == "" {
-		return fmt.Errorf("GITHUB_TOKEN (or GITHUB_PAT) is required in environment")
-	}
-
-	if err := os.MkdirAll(defaultStateDir(), 0o755); err != nil {
-		return fmt.Errorf("prepare state dir: %w", err)
-	}
-	if err := os.WriteFile(pidFilePath(), []byte(fmt.Sprintf("%d", os.Getpid())), 0o644); err != nil {
-		return fmt.Errorf("write pid file: %w", err)
-	}
-	defer os.Remove(pidFilePath())
-
-	gh := github.New(token)
-	rm := runner.New(cfg.Defaults.CacheDir, logger)
-
-	rm.CleanupStale(uniqueWorkdirs(cfg))
-
-	logger.Printf("github cleanup: startup sweep")
-	if err := cleanupGitHubRegistrations(context.Background(), gh, cfg, logger); err != nil {
-		logger.Printf("warning: github cleanup failed: %v", err)
-	}
-
-	rec := reconciler.New(logger, gh, rm)
-	rec.SetDesired(cfg)
-
-	ctx, stop := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM)
-	defer stop()
-
-	go func() {
-		signals := make(chan os.Signal, 1)
-		signal.Notify(signals, syscall.SIGHUP)
-		for range signals {
-			logger.Printf("reload requested (SIGHUP)")
-			updated, err := config.Load(configPath)
-			if err != nil {
-				logger.Printf("reload failed: %v", err)
-				continue
-			}
-			rec.SetDesired(updated)
-		}
-	}()
-
-	err = rec.Run(ctx, interval)
-	shutdownCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
-	defer cancel()
-	rec.Shutdown(shutdownCtx)
-	logger.Printf("github cleanup: shutdown sweep")
-	if err := cleanupGitHubRegistrations(shutdownCtx, gh, cfg, logger); err != nil {
-		logger.Printf("warning: github cleanup (shutdown) failed: %v", err)
-	}
-	return err
-}
-
-func cleanupGitHubRegistrations(ctx context.Context, gh *github.Client, cfg *config.Config, logger *log.Logger) error {
-	runners, err := gh.ListRunners(ctx, cfg.GitHub)
-	if err != nil {
-		return err
-	}
-	groupPrefixes := make(map[string]struct{}, len(cfg.Groups))
-	for _, g := range cfg.Groups {
-		groupPrefixes[g.Name+"-"] = struct{}{}
-	}
-
-	deleted := 0
-	for _, rn := range runners {
-		for prefix := range groupPrefixes {
-			if strings.HasPrefix(rn.Name, prefix) {
-				if err := gh.DeleteRunner(ctx, cfg.GitHub, rn.ID); err != nil {
-					return fmt.Errorf("delete runner %s (%d): %w", rn.Name, rn.ID, err)
-				}
-				deleted++
-				break
-			}
-		}
-	}
-	logger.Printf("github cleanup: inspected=%d deleted=%d", len(runners), deleted)
-	return nil
-}
diff --git a/old-version/internal/cli/purge.go b/old-version/internal/cli/purge.go
deleted file mode 100644
index bac7929..0000000
--- a/old-version/internal/cli/purge.go
+++ /dev/null
@@ -1,87 +0,0 @@
-package cli
-
-import (
-	"context"
-	"fmt"
-	"os"
-	"time"
-
-	"gh-runners-tool/internal/config"
-	"gh-runners-tool/internal/logging"
-	"gh-runners-tool/internal/provider/github"
-	"github.com/spf13/cobra"
-)
-
-func purgeCmd() *cobra.Command {
-	var timeout time.Duration
-	var waitInterval time.Duration
-
-	cmd := &cobra.Command{
-		Use:   "purge",
-		Short: "Delete all self-hosted runners for the configured scope (waits for busy runners to go idle)",
-		RunE: func(cmd *cobra.Command, args []string) error {
-			logger := logging.New()
-
-			cfg, err := config.Load(configPath)
-			if err != nil {
-				return err
-			}
-			token := os.Getenv("GITHUB_TOKEN")
-			if token == "" {
-				token = os.Getenv("GITHUB_PAT")
-			}
-			if token == "" {
-				return fmt.Errorf("GITHUB_TOKEN (or GITHUB_PAT) is required in environment")
-			}
-
-			gh := github.New(token)
-
-			ctx, cancel := context.WithTimeout(context.Background(), timeout)
-			defer cancel()
-
-			logger.Printf("purge: starting (timeout=%s, interval=%s)", timeout, waitInterval)
-			for {
-				runners, err := gh.ListRunners(ctx, cfg.GitHub)
-				if err != nil {
-					return fmt.Errorf("list runners: %w", err)
-				}
-				if len(runners) == 0 {
-					logger.Printf("purge: nothing to delete")
-					return nil
-				}
-
-				deleted := 0
-				busy := 0
-				for _, rn := range runners {
-					if rn.Busy || rn.Status == "busy" {
-						busy++
-						continue
-					}
-					if err := gh.DeleteRunner(ctx, cfg.GitHub, rn.ID); err != nil {
-						return fmt.Errorf("delete runner %s (%d): %w", rn.Name, rn.ID, err)
-					}
-					deleted++
-					logger.Printf("purge: deleted %s (%d)", rn.Name, rn.ID)
-				}
-
-				remaining := len(runners) - deleted
-				if remaining == 0 {
-					logger.Printf("purge: completed")
-					return nil
-				}
-				logger.Printf("purge: remaining=%d busy=%d, waiting %s", remaining, busy, waitInterval)
-
-				select {
-				case <-ctx.Done():
-					return fmt.Errorf("purge timeout: %w", ctx.Err())
-				case <-time.After(waitInterval):
-				}
-			}
-		},
-	}
-
-	cmd.Flags().DurationVar(&timeout, "timeout", 5*time.Minute, "Overall timeout for purge")
-	cmd.Flags().DurationVar(&waitInterval, "interval", 5*time.Second, "Wait interval when runners are busy")
-
-	return cmd
-}
diff --git a/old-version/internal/cli/root.go b/old-version/internal/cli/root.go
deleted file mode 100644
index e6c9280..0000000
--- a/old-version/internal/cli/root.go
+++ /dev/null
@@ -1,73 +0,0 @@
-package cli
-
-import (
-	"os"
-	"path/filepath"
-	"time"
-
-	"gh-runners-tool/internal/config"
-	"github.com/spf13/cobra"
-)
-
-var (
-	configPath string
-	interval   time.Duration
-)
-
-func Execute() error {
-	root := &cobra.Command{
-		Use:   "ghr",
-		Short: "GitHub runner controller (macOS)",
-	}
-
-	root.PersistentFlags().StringVar(&configPath, "config", "config.yaml", "Path to configuration file")
-	root.PersistentFlags().DurationVar(&interval, "interval", 15*time.Second, "Reconciliation interval for daemon")
-
-	root.AddCommand(daemonCmd())
-	root.AddCommand(applyCmd())
-	root.AddCommand(statusCmd())
-	root.AddCommand(purgeCmd())
-
-	return root.Execute()
-}
-
-func defaultStateDir() string {
-	if dir := os.Getenv("GHR_STATE_DIR"); dir != "" {
-		return dir
-	}
-	system := filepath.Join("/var/lib/ghr/state")
-	if err := os.MkdirAll(system, 0o755); err == nil {
-		return system
-	}
-	home, err := os.UserHomeDir()
-	if err != nil {
-		return system
-	}
-	return filepath.Join(home, ".local", "state", "ghr")
-}
-
-func pidFilePath() string {
-	return filepath.Join(defaultStateDir(), "daemon.pid")
-}
-
-func uniqueWorkdirs(cfg *config.Config) []string {
-	seen := make(map[string]struct{})
-	add := func(path string) {
-		if path == "" {
-			return
-		}
-		if _, ok := seen[path]; ok {
-			return
-		}
-		seen[path] = struct{}{}
-	}
-	add(cfg.Defaults.WorkdirBase)
-	for _, g := range cfg.Groups {
-		add(g.WorkdirBase)
-	}
-	out := make([]string, 0, len(seen))
-	for k := range seen {
-		out = append(out, k)
-	}
-	return out
-}
diff --git a/old-version/internal/cli/status.go b/old-version/internal/cli/status.go
deleted file mode 100644
index 1d36503..0000000
--- a/old-version/internal/cli/status.go
+++ /dev/null
@@ -1,160 +0,0 @@
-package cli
-
-import (
-	"errors"
-	"fmt"
-	"os"
-	"path/filepath"
-	"strconv"
-	"strings"
-	"syscall"
-
-	"gh-runners-tool/internal/config"
-	"github.com/spf13/cobra"
-)
-
-func statusCmd() *cobra.Command {
-	cmd := &cobra.Command{
-		Use:   "status",
-		Short: "Show daemon presence (pid file)",
-		RunE: func(cmd *cobra.Command, args []string) error {
-			cfg, err := config.Load(configPath)
-			if err != nil {
-				return fmt.Errorf("load config %s: %w", configPath, err)
-			}
-
-			pidBytes, err := os.ReadFile(pidFilePath())
-			if err != nil {
-				return fmt.Errorf("daemon not running or pid file missing (%s): %w", pidFilePath(), err)
-			}
-			pid, err := strconv.Atoi(strings.TrimSpace(string(pidBytes)))
-			if err != nil {
-				return fmt.Errorf("invalid pid file: %w", err)
-			}
-
-			alive, err := pidAlive(pid)
-			if err != nil {
-				return fmt.Errorf("probe daemon pid %d: %w", pid, err)
-			}
-
-			stats, total, warnings, err := collectRunnerStats(cfg)
-			if err != nil {
-				return err
-			}
-
-			cmd.Printf("daemon: %s (pid=%d)\n", ternary(alive, "running", "not responding"), pid)
-			cmd.Printf("config: %s\n", configPath)
-
-			for _, g := range cfg.Groups {
-				s := stats[g.Name]
-				cmd.Printf("group %-20s desired=%-3d running=%-3d stale=%-3d unknown=%-3d base=%s\n",
-					g.Name, g.Count, s.Running, s.Stale, s.Unknown, g.WorkdirBase)
-			}
-			cmd.Printf("total runners: running=%d stale=%d unknown=%d\n", total.Running, total.Stale, total.Unknown)
-			for _, w := range warnings {
-				cmd.Printf("warning: %s\n", w)
-			}
-			return nil
-		},
-	}
-	return cmd
-}
-
-type runnerStats struct {
-	Running int
-	Stale   int
-	Unknown int
-}
-
-func collectRunnerStats(cfg *config.Config) (map[string]runnerStats, runnerStats, []string, error) {
-	stats := make(map[string]runnerStats, len(cfg.Groups))
-	for _, g := range cfg.Groups {
-		stats[g.Name] = runnerStats{}
-	}
-
-	baseToGroup := make(map[string]string, len(cfg.Groups))
-	for _, g := range cfg.Groups {
-		baseToGroup[g.WorkdirBase] = g.Name
-	}
-
-	var total runnerStats
-	var warnings []string
-
-	for base, group := range baseToGroup {
-		entries, err := os.ReadDir(base)
-		if err != nil {
-			if os.IsNotExist(err) {
-				warnings = append(warnings, fmt.Sprintf("workdir base missing: %s", base))
-				continue
-			}
-			return nil, total, warnings, fmt.Errorf("read workdir base %s: %w", base, err)
-		}
-		for _, entry := range entries {
-			if !entry.IsDir() {
-				continue
-			}
-			dir := filepath.Join(base, entry.Name())
-			pidPath := filepath.Join(dir, ".ghr-pid")
-			pidBytes, err := os.ReadFile(pidPath)
-			if err != nil {
-				stats[group] = addUnknown(stats[group])
-				total.Unknown++
-				continue
-			}
-			pid, err := strconv.Atoi(strings.TrimSpace(string(pidBytes)))
-			if err != nil {
-				stats[group] = addUnknown(stats[group])
-				total.Unknown++
-				continue
-			}
-			alive, err := pidAlive(pid)
-			if err != nil {
-				return nil, total, warnings, fmt.Errorf("probe runner pid %d (%s): %w", pid, dir, err)
-			}
-			if alive {
-				stats[group] = addRunning(stats[group])
-				total.Running++
-			} else {
-				stats[group] = addStale(stats[group])
-				total.Stale++
-			}
-		}
-	}
-	return stats, total, warnings, nil
-}
-
-func pidAlive(pid int) (bool, error) {
-	if pid <= 0 {
-		return false, fmt.Errorf("invalid pid %d", pid)
-	}
-	err := syscall.Kill(pid, 0)
-	if err == nil || errors.Is(err, syscall.EPERM) {
-		return true, nil
-	}
-	if errors.Is(err, syscall.ESRCH) {
-		return false, nil
-	}
-	return false, err
-}
-
-func addRunning(s runnerStats) runnerStats {
-	s.Running++
-	return s
-}
-
-func addStale(s runnerStats) runnerStats {
-	s.Stale++
-	return s
-}
-
-func addUnknown(s runnerStats) runnerStats {
-	s.Unknown++
-	return s
-}
-
-func ternary[T any](cond bool, a, b T) T {
-	if cond {
-		return a
-	}
-	return b
-}
diff --git a/old-version/internal/config/config.go b/old-version/internal/config/config.go
deleted file mode 100644
index af92390..0000000
--- a/old-version/internal/config/config.go
+++ /dev/null
@@ -1,110 +0,0 @@
-package config
-
-import (
-	"fmt"
-	"os"
-	"path/filepath"
-
-	"github.com/joho/godotenv"
-	"gopkg.in/yaml.v3"
-)
-
-type GitHubScope string
-
-const (
-	ScopeOrg  GitHubScope = "org"
-	ScopeRepo GitHubScope = "repo"
-)
-
-type GitHubConfig struct {
-	Scope GitHubScope `yaml:"scope"`
-	Owner string      `yaml:"owner"`
-	Repo  string      `yaml:"repo,omitempty"`
-}
-
-type RunnerDefaults struct {
-	WorkdirBase string `yaml:"workdir_base"`
-	CacheDir    string `yaml:"cache_dir"`
-	Version     string `yaml:"version"` // e.g. "2.319.1" or "latest"
-}
-
-type GroupSpec struct {
-	Name        string   `yaml:"name"`
-	Count       int      `yaml:"count"`
-	Ephemeral   bool     `yaml:"ephemeral"`
-	Labels      []string `yaml:"labels"`
-	WorkdirBase string   `yaml:"workdir_base,omitempty"`
-	Version     string   `yaml:"version,omitempty"`
-}
-
-type Config struct {
-	GitHub   GitHubConfig   `yaml:"github"`
-	Defaults RunnerDefaults `yaml:"defaults"`
-	Groups   []GroupSpec    `yaml:"groups"`
-}
-
-// Load loads configuration from YAML and .env (env is mandatory for tokens).
-func Load(path string) (*Config, error) {
-	if err := godotenv.Load(); err != nil && !os.IsNotExist(err) {
-		return nil, fmt.Errorf("loading .env: %w", err)
-	}
-
-	bytes, err := os.ReadFile(path)
-	if err != nil {
-		return nil, fmt.Errorf("read config: %w", err)
-	}
-
-	cfg := &Config{}
-	if err := yaml.Unmarshal(bytes, cfg); err != nil {
-		return nil, fmt.Errorf("parse config: %w", err)
-	}
-
-	if err := validate(cfg); err != nil {
-		return nil, err
-	}
-
-	if cfg.Defaults.WorkdirBase == "" {
-		cfg.Defaults.WorkdirBase = "/var/lib/ghr/groups"
-	}
-	if cfg.Defaults.CacheDir == "" {
-		cfg.Defaults.CacheDir = "/var/lib/ghr/cache"
-	}
-	if cfg.Defaults.Version == "" {
-		cfg.Defaults.Version = "latest"
-	}
-
-	for i := range cfg.Groups {
-		if cfg.Groups[i].WorkdirBase == "" {
-			cfg.Groups[i].WorkdirBase = filepath.Join(cfg.Defaults.WorkdirBase, cfg.Groups[i].Name)
-		}
-		if cfg.Groups[i].Version == "" {
-			cfg.Groups[i].Version = cfg.Defaults.Version
-		}
-	}
-
-	return cfg, nil
-}
-
-func validate(cfg *Config) error {
-	if cfg.GitHub.Scope != ScopeOrg && cfg.GitHub.Scope != ScopeRepo {
-		return fmt.Errorf("github.scope must be 'org' or 'repo'")
-	}
-	if cfg.GitHub.Owner == "" {
-		return fmt.Errorf("github.owner is required")
-	}
-	if cfg.GitHub.Scope == ScopeRepo && cfg.GitHub.Repo == "" {
-		return fmt.Errorf("github.repo is required when scope=repo")
-	}
-	if len(cfg.Groups) == 0 {
-		return fmt.Errorf("at least one group is required")
-	}
-	for _, g := range cfg.Groups {
-		if g.Name == "" {
-			return fmt.Errorf("group.name is required")
-		}
-		if g.Count < 0 {
-			return fmt.Errorf("group.count must be >= 0")
-		}
-	}
-	return nil
-}
diff --git a/old-version/internal/domain/types.go b/old-version/internal/domain/types.go
deleted file mode 100644
index b0a550c..0000000
--- a/old-version/internal/domain/types.go
+++ /dev/null
@@ -1,19 +0,0 @@
-package domain
-
-type Group struct {
-	Name      string
-	Count     int
-	Ephemeral bool
-	Labels    []string
-	Workdir   string
-	Version   string
-}
-
-type RunnerInstance struct {
-	ID        string
-	GroupName string
-	Ephemeral bool
-	Workdir   string
-	Labels    []string
-	Version   string
-}
diff --git a/old-version/internal/logging/logging.go b/old-version/internal/logging/logging.go
deleted file mode 100644
index d0cbbb6..0000000
--- a/old-version/internal/logging/logging.go
+++ /dev/null
@@ -1,12 +0,0 @@
-package logging
-
-import (
-	"log"
-	"os"
-)
-
-// * Provides a basic logger configured for stdout.
-func New() *log.Logger {
-	logger := log.New(os.Stdout, "[ghr] ", log.LstdFlags|log.Lmicroseconds)
-	return logger
-}
diff --git a/old-version/internal/provider/github/client.go b/old-version/internal/provider/github/client.go
deleted file mode 100644
index d6c3782..0000000
--- a/old-version/internal/provider/github/client.go
+++ /dev/null
@@ -1,166 +0,0 @@
-package github
-
-import (
-	"bytes"
-	"context"
-	"encoding/json"
-	"fmt"
-	"io"
-	"net/http"
-	"time"
-
-	"gh-runners-tool/internal/config"
-)
-
-type Client struct {
-	httpClient *http.Client
-	token      string
-}
-
-type registrationTokenResponse struct {
-	Token     string    `json:"token"`
-	ExpiresAt time.Time `json:"expires_at"`
-}
-
-// New creates a GitHub API client using a PAT from env.
-func New(token string) *Client {
-	return &Client{
-		httpClient: &http.Client{Timeout: 15 * time.Second},
-		token:      token,
-	}
-}
-
-type Runner struct {
-	ID     int64  `json:"id"`
-	Name   string `json:"name"`
-	Status string `json:"status"`
-	Busy   bool   `json:"busy"`
-}
-
-type listRunnersResponse struct {
-	Runners []Runner `json:"runners"`
-}
-
-// RegistrationToken requests a registration token for runners.
-func (c *Client) RegistrationToken(ctx context.Context, gh config.GitHubConfig) (string, error) {
-	url := ""
-	switch gh.Scope {
-	case config.ScopeOrg:
-		url = fmt.Sprintf("https://api.github.com/orgs/%s/actions/runners/registration-token", gh.Owner)
-	case config.ScopeRepo:
-		url = fmt.Sprintf("https://api.github.com/repos/%s/%s/actions/runners/registration-token", gh.Owner, gh.Repo)
-	default:
-		return "", fmt.Errorf("unknown scope %s", gh.Scope)
-	}
-
-	req, err := http.NewRequestWithContext(ctx, http.MethodPost, url, bytes.NewReader([]byte("{}")))
-	if err != nil {
-		return "", fmt.Errorf("build request: %w", err)
-	}
-	req.Header.Set("Accept", "application/vnd.github+json")
-	req.Header.Set("Authorization", "Bearer "+c.token)
-
-	resp, err := c.httpClient.Do(req)
-	if err != nil {
-		return "", fmt.Errorf("request registration token: %w", err)
-	}
-	defer resp.Body.Close()
-
-	if resp.StatusCode >= 300 {
-		return "", fmt.Errorf("registration token failed: status %d", resp.StatusCode)
-	}
-
-	var decoded registrationTokenResponse
-	if err := json.NewDecoder(resp.Body).Decode(&decoded); err != nil {
-		return "", fmt.Errorf("decode response: %w", err)
-	}
-	if decoded.Token == "" {
-		return "", fmt.Errorf("empty token returned")
-	}
-	return decoded.Token, nil
-}
-
-// ListRunners returns all runners for the configured scope (first page).
-func (c *Client) ListRunners(ctx context.Context, gh config.GitHubConfig) ([]Runner, error) {
-	var all []Runner
-	page := 1
-
-	for {
-		url := ""
-		switch gh.Scope {
-		case config.ScopeOrg:
-			url = fmt.Sprintf("https://api.github.com/orgs/%s/actions/runners?per_page=100&page=%d", gh.Owner, page)
-		case config.ScopeRepo:
-			url = fmt.Sprintf("https://api.github.com/repos/%s/%s/actions/runners?per_page=100&page=%d", gh.Owner, gh.Repo, page)
-		default:
-			return nil, fmt.Errorf("unknown scope %s", gh.Scope)
-		}
-
-		req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
-		if err != nil {
-			return nil, fmt.Errorf("build request: %w", err)
-		}
-		req.Header.Set("Accept", "application/vnd.github+json")
-		req.Header.Set("Authorization", "Bearer "+c.token)
-
-		resp, err := c.httpClient.Do(req)
-		if err != nil {
-			return nil, fmt.Errorf("list runners: %w", err)
-		}
-		if resp.StatusCode >= 300 {
-			resp.Body.Close()
-			return nil, fmt.Errorf("list runners failed: status %d", resp.StatusCode)
-		}
-
-		var decoded listRunnersResponse
-		if err := json.NewDecoder(resp.Body).Decode(&decoded); err != nil {
-			resp.Body.Close()
-			return nil, fmt.Errorf("decode response: %w", err)
-		}
-
-		all = append(all, decoded.Runners...)
-		resp.Body.Close()
-
-		if len(decoded.Runners) < 100 {
-			break
-		}
-		page++
-	}
-
-	return all, nil
-}
-
-// DeleteRunner removes a runner registration by ID.
-func (c *Client) DeleteRunner(ctx context.Context, gh config.GitHubConfig, id int64) error {
-	url := ""
-	switch gh.Scope {
-	case config.ScopeOrg:
-		url = fmt.Sprintf("https://api.github.com/orgs/%s/actions/runners/%d", gh.Owner, id)
-	case config.ScopeRepo:
-		url = fmt.Sprintf("https://api.github.com/repos/%s/%s/actions/runners/%d", gh.Owner, gh.Repo, id)
-	default:
-		return fmt.Errorf("unknown scope %s", gh.Scope)
-	}
-
-	req, err := http.NewRequestWithContext(ctx, http.MethodDelete, url, nil)
-	if err != nil {
-		return fmt.Errorf("build request: %w", err)
-	}
-	req.Header.Set("Accept", "application/vnd.github+json")
-	req.Header.Set("Authorization", "Bearer "+c.token)
-
-	resp, err := c.httpClient.Do(req)
-	if err != nil {
-		return fmt.Errorf("delete runner: %w", err)
-	}
-	defer resp.Body.Close()
-
-	if resp.StatusCode == http.StatusNotFound {
-		return nil
-	}
-	if resp.StatusCode >= 300 {
-		body, _ := io.ReadAll(resp.Body)
-		return fmt.Errorf("delete runner failed: status %d body=%s", resp.StatusCode, string(body))
-	}
-	return nil
-}
diff --git a/old-version/internal/reconciler/reconciler.go b/old-version/internal/reconciler/reconciler.go
deleted file mode 100644
index c2db09a..0000000
--- a/old-version/internal/reconciler/reconciler.go
+++ /dev/null
@@ -1,160 +0,0 @@
-package reconciler
-
-import (
-	"context"
-	"fmt"
-	"sync"
-	"time"
-
-	"gh-runners-tool/internal/config"
-	"gh-runners-tool/internal/domain"
-	"gh-runners-tool/internal/provider/github"
-	"gh-runners-tool/internal/runner"
-)
-
-type Logger interface {
-	Printf(string, ...any)
-}
-
-type Reconciler struct {
-	logger  Logger
-	gh      *github.Client
-	runners *runner.Manager
-
-	mu         sync.Mutex
-	desired    *config.Config
-	groupPools map[string]*slotPool
-	ghCfg      config.GitHubConfig
-
-	stopOnce sync.Once
-}
-
-func New(logger Logger, gh *github.Client, runners *runner.Manager) *Reconciler {
-	return &Reconciler{
-		logger:     logger,
-		gh:         gh,
-		runners:    runners,
-		groupPools: make(map[string]*slotPool),
-	}
-}
-
-func (r *Reconciler) SetDesired(cfg *config.Config) {
-	r.mu.Lock()
-	defer r.mu.Unlock()
-	r.desired = cfg
-	r.ghCfg = cfg.GitHub
-}
-
-func (r *Reconciler) Run(ctx context.Context, interval time.Duration) error {
-	if interval <= 0 {
-		interval = 15 * time.Second
-	}
-
-	ticker := time.NewTicker(interval)
-	defer ticker.Stop()
-
-	if err := r.reconcile(ctx); err != nil {
-		r.logger.Printf("reconcile error: %v", err)
-	}
-
-	for {
-		select {
-		case <-ctx.Done():
-			return ctx.Err()
-		case <-ticker.C:
-			if err := r.reconcile(ctx); err != nil {
-				r.logger.Printf("reconcile error: %v", err)
-			}
-		}
-	}
-}
-
-func (r *Reconciler) reconcile(ctx context.Context) error {
-	r.mu.Lock()
-	cfg := r.desired
-	r.mu.Unlock()
-
-	if cfg == nil {
-		return fmt.Errorf("no desired config set")
-	}
-
-	desired := make(map[string]domain.Group, len(cfg.Groups))
-	for _, g := range cfg.Groups {
-		desired[g.Name] = domain.Group{
-			Name:      g.Name,
-			Count:     g.Count,
-			Ephemeral: g.Ephemeral,
-			Labels:    g.Labels,
-			Workdir:   g.WorkdirBase,
-			Version:   g.Version,
-		}
-	}
-
-	// Remove pools for groups that are no longer desired.
-	for name := range r.groupPools {
-		if _, ok := desired[name]; !ok {
-			r.groupPools[name].stop()
-			delete(r.groupPools, name)
-			r.logger.Printf("group %s stopped", name)
-		}
-	}
-
-	// Ensure pools exist and match desired count.
-	for name, grp := range desired {
-		pool, ok := r.groupPools[name]
-		if !ok {
-			pool = newSlotPool(r.logger, r.gh, r.runners, grp, r.ghCfg)
-			r.groupPools[name] = pool
-			r.logger.Printf("group %s started with %d slots", name, grp.Count)
-		}
-		pool.update(grp)
-	}
-
-	return nil
-}
-
-// Shutdown stops all slots and runners when the daemon exits.
-func (r *Reconciler) Shutdown(ctx context.Context) {
-	r.stopOnce.Do(func() {
-		r.mu.Lock()
-		pools := r.snapshotPools()
-		r.mu.Unlock()
-
-		stopPools(pools)
-
-		waitCtx, cancel := context.WithTimeout(ctx, 30*time.Second)
-		defer cancel()
-		waitPools(waitCtx, pools, r.logger)
-	})
-}
-
-func (r *Reconciler) snapshotPools() []*slotPool {
-	out := make([]*slotPool, 0, len(r.groupPools))
-	for _, p := range r.groupPools {
-		out = append(out, p)
-	}
-	return out
-}
-
-func stopPools(pools []*slotPool) {
-	for _, p := range pools {
-		p.stop()
-	}
-}
-
-func waitPools(ctx context.Context, pools []*slotPool, logger Logger) {
-	for _, p := range pools {
-		p.wait(ctx)
-	}
-}
-
-// Status returns a snapshot of current slots by group.
-func (r *Reconciler) Status() map[string]int {
-	r.mu.Lock()
-	defer r.mu.Unlock()
-	out := make(map[string]int)
-	for name, pool := range r.groupPools {
-		out[name] = pool.size()
-	}
-	return out
-}
diff --git a/old-version/internal/reconciler/slots.go b/old-version/internal/reconciler/slots.go
deleted file mode 100644
index 68061ab..0000000
--- a/old-version/internal/reconciler/slots.go
+++ /dev/null
@@ -1,229 +0,0 @@
-package reconciler
-
-import (
-	"context"
-	"fmt"
-	"math/rand"
-	"sync"
-	"time"
-
-	"gh-runners-tool/internal/config"
-	"gh-runners-tool/internal/domain"
-	"gh-runners-tool/internal/provider/github"
-	"gh-runners-tool/internal/runner"
-)
-
-type slotPool struct {
-	logger  Logger
-	gh      *github.Client
-	runners *runner.Manager
-	ghCfg   config.GitHubConfig
-
-	mu       sync.Mutex
-	group    domain.Group
-	slots    map[int]context.CancelFunc
-	wg       sync.WaitGroup
-	stopping bool
-}
-
-func newSlotPool(logger Logger, gh *github.Client, runners *runner.Manager, group domain.Group, ghCfg config.GitHubConfig) *slotPool {
-	return &slotPool{
-		logger:  logger,
-		gh:      gh,
-		runners: runners,
-		ghCfg:   ghCfg,
-		group:   group,
-		slots:   make(map[int]context.CancelFunc),
-	}
-}
-
-func (p *slotPool) update(group domain.Group) {
-	p.mu.Lock()
-	defer p.mu.Unlock()
-	p.group = group
-	target := group.Count
-	current := len(p.slots)
-
-	if target > current {
-		for i := current; i < target; i++ {
-			p.startSlotLocked(i)
-		}
-	}
-	if target < current {
-		diff := current - target
-		i := 0
-		for id, cancel := range p.slots {
-			if i >= diff {
-				break
-			}
-			cancel()
-			delete(p.slots, id)
-			i++
-		}
-	}
-}
-
-func (p *slotPool) startSlotLocked(id int) {
-	ctx, cancel := context.WithCancel(context.Background())
-	p.slots[id] = cancel
-	p.wg.Add(1)
-	go p.runSlot(ctx, id)
-}
-
-func (p *slotPool) runSlot(ctx context.Context, slotID int) {
-	defer p.wg.Done()
-
-	const (
-		minBackoff = 2 * time.Second
-		maxBackoff = 30 * time.Second
-	)
-	backoff := minBackoff
-
-	for {
-		group := p.currentGroup()
-
-		select {
-		case <-ctx.Done():
-			return
-		default:
-		}
-
-		token, err := p.gh.RegistrationToken(ctx, p.ghCfg)
-		if err != nil {
-			p.logger.Printf("slot %d group=%s: registration token: %v", slotID, group.Name, err)
-			if !sleepOrDone(ctx, jitter(backoff)) {
-				return
-			}
-			if backoff < maxBackoff {
-				backoff *= 2
-				if backoff > maxBackoff {
-					backoff = maxBackoff
-				}
-			}
-			continue
-		}
-
-		inst := runner.NewRunnerInstance(group)
-		handle, err := p.runners.Start(ctx, inst, p.ghCfg, token)
-		if err != nil {
-			p.logger.Printf("slot %d group=%s: start runner: %v", slotID, group.Name, err)
-			if !sleepOrDone(ctx, jitter(backoff)) {
-				return
-			}
-			if backoff < maxBackoff {
-				backoff *= 2
-				if backoff > maxBackoff {
-					backoff = maxBackoff
-				}
-			}
-			continue
-		}
-
-		backoff = minBackoff
-
-		err = handle.Wait()
-		if err != nil {
-			p.logger.Printf("slot %d group=%s: runner %s exited with error: %v", slotID, group.Name, handle.ID, err)
-		} else {
-			p.logger.Printf("slot %d group=%s: runner %s exited normally", slotID, group.Name, handle.ID)
-		}
-
-		go func(h *runner.Handle) {
-			ctxUnreg, cancel := context.WithTimeout(ctx, 15*time.Second)
-			defer cancel()
-			if err := p.unregister(ctxUnreg, h); err != nil {
-				p.logger.Printf("slot %d group=%s: unregister %s: %v", slotID, group.Name, h.ID, err)
-			}
-		}(handle)
-
-		if !sleepOrDone(ctx, jitter(minBackoff)) {
-			return
-		}
-	}
-}
-
-func (p *slotPool) unregister(ctx context.Context, h *runner.Handle) error {
-	name := runnerName(h.Group, h.ID)
-	runners, err := p.gh.ListRunners(ctx, p.ghCfg)
-	if err != nil {
-		return err
-	}
-	for _, rn := range runners {
-		if rn.Name == name {
-			return p.gh.DeleteRunner(ctx, p.ghCfg, rn.ID)
-		}
-	}
-	return nil
-}
-
-func (p *slotPool) stop() {
-	p.mu.Lock()
-	defer p.mu.Unlock()
-	if p.stopping {
-		return
-	}
-	p.stopping = true
-	for _, cancel := range p.slots {
-		cancel()
-	}
-}
-
-func (p *slotPool) wait(ctx context.Context) {
-	done := make(chan struct{})
-	go func() {
-		p.wg.Wait()
-		close(done)
-	}()
-
-	select {
-	case <-done:
-	case <-ctx.Done():
-		p.logger.Printf("slot pool group=%s wait timeout", p.group.Name)
-	}
-}
-
-func sleepOrDone(ctx context.Context, d time.Duration) bool {
-	select {
-	case <-time.After(d):
-		return true
-	case <-ctx.Done():
-		return false
-	}
-}
-
-func (p *slotPool) size() int {
-	p.mu.Lock()
-	defer p.mu.Unlock()
-	return len(p.slots)
-}
-
-func (p *slotPool) currentGroup() domain.Group {
-	p.mu.Lock()
-	defer p.mu.Unlock()
-	return p.group
-}
-
-func jitter(d time.Duration) time.Duration {
-	if d <= 0 {
-		return time.Second
-	}
-	// * Apply ±20% jitter to avoid thundering herd on retries.
-	delta := d / 5
-	if delta <= 0 {
-		delta = time.Millisecond
-	}
-	offset := rand.Int63n(int64(delta)*2+1) - int64(delta)
-	out := d + time.Duration(offset)
-	if out < time.Millisecond {
-		return time.Millisecond
-	}
-	return out
-}
-
-func init() {
-	rand.Seed(time.Now().UnixNano())
-}
-
-func runnerName(group, id string) string {
-	return fmt.Sprintf("%s-%s", group, id)
-}
diff --git a/old-version/internal/runner/manager.go b/old-version/internal/runner/manager.go
deleted file mode 100644
index 489eca1..0000000
--- a/old-version/internal/runner/manager.go
+++ /dev/null
@@ -1,398 +0,0 @@
-package runner
-
-import (
-	"archive/tar"
-	"compress/gzip"
-	"context"
-	"crypto/rand"
-	"encoding/hex"
-	"encoding/json"
-	"errors"
-	"fmt"
-	"io"
-	"net/http"
-	"os"
-	"os/exec"
-	"path/filepath"
-	"runtime"
-	"strconv"
-	"strings"
-	"sync"
-	"time"
-
-	"gh-runners-tool/internal/config"
-	"gh-runners-tool/internal/domain"
-)
-
-const pidFileName = ".ghr-pid"
-
-type Manager struct {
-	cacheDir   string
-	logger     Logger
-	httpClient *http.Client
-	mu         sync.Mutex
-}
-
-type Logger interface {
-	Printf(string, ...any)
-}
-
-type Handle struct {
-	ID      string
-	Group   string
-	Cmd     *exec.Cmd
-	Workdir string
-	done    chan struct{}
-	err     error
-}
-
-func (h *Handle) Wait() error {
-	<-h.done
-	return h.err
-}
-
-func (h *Handle) Done() <-chan struct{} {
-	return h.done
-}
-
-func New(cacheDir string, logger Logger) *Manager {
-	return &Manager{
-		cacheDir:   cacheDir,
-		logger:     logger,
-		httpClient: &http.Client{Timeout: 60 * time.Second},
-	}
-}
-
-// Start prepares and launches a runner process for the given instance.
-func (m *Manager) Start(ctx context.Context, inst domain.RunnerInstance, gh config.GitHubConfig, registrationToken string) (*Handle, error) {
-	baseDir, err := m.ensureRunnerBits(ctx, inst.Version)
-	if err != nil {
-		return nil, err
-	}
-
-	if err := os.MkdirAll(inst.Workdir, 0o755); err != nil {
-		return nil, fmt.Errorf("create workdir: %w", err)
-	}
-
-	if err := copyDir(baseDir, inst.Workdir); err != nil {
-		return nil, fmt.Errorf("copy runner files: %w", err)
-	}
-
-	name := fmt.Sprintf("%s-%s", inst.GroupName, inst.ID)
-	url := runnerURL(gh)
-
-	configArgs := []string{
-		filepath.Join(inst.Workdir, "config.sh"),
-		"--unattended",
-		"--url", url,
-		"--token", registrationToken,
-		"--name", name,
-	}
-	if len(inst.Labels) > 0 {
-		configArgs = append(configArgs, "--labels", strings.Join(inst.Labels, ","))
-	}
-	if inst.Ephemeral {
-		configArgs = append(configArgs, "--ephemeral")
-	}
-
-	configCmd := exec.CommandContext(ctx, "bash", configArgs...)
-	configCmd.Dir = inst.Workdir
-	configCmd.Stdout = os.Stdout
-	configCmd.Stderr = os.Stderr
-	if err := configCmd.Run(); err != nil {
-		_ = os.RemoveAll(inst.Workdir)
-		return nil, fmt.Errorf("config runner: %w", err)
-	}
-
-	runCmd := exec.CommandContext(ctx, filepath.Join(inst.Workdir, "run.sh"))
-	runCmd.Dir = inst.Workdir
-	runCmd.Stdout = os.Stdout
-	runCmd.Stderr = os.Stderr
-
-	if err := runCmd.Start(); err != nil {
-		_ = os.RemoveAll(inst.Workdir)
-		return nil, fmt.Errorf("start runner: %w", err)
-	}
-
-	if err := m.writePID(inst.Workdir, runCmd.Process.Pid); err != nil {
-		_ = runCmd.Process.Kill()
-		_ = os.RemoveAll(inst.Workdir)
-		return nil, fmt.Errorf("write pid: %w", err)
-	}
-
-	handle := &Handle{
-		ID:      inst.ID,
-		Group:   inst.GroupName,
-		Cmd:     runCmd,
-		Workdir: inst.Workdir,
-		done:    make(chan struct{}),
-	}
-
-	go func() {
-		defer close(handle.done)
-		handle.err = runCmd.Wait()
-		// Cleanup workdir regardless of exit status.
-		_ = os.RemoveAll(inst.Workdir)
-	}()
-
-	return handle, nil
-}
-
-func (m *Manager) ensureRunnerBits(ctx context.Context, version string) (string, error) {
-	resolvedVersion, err := m.resolveVersion(ctx, version)
-	if err != nil {
-		return "", err
-	}
-
-	m.mu.Lock()
-	defer m.mu.Unlock()
-
-	targetDir := filepath.Join(m.cacheDir, resolvedVersion)
-	if _, err := os.Stat(targetDir); err == nil {
-		return targetDir, nil
-	}
-
-	if err := os.MkdirAll(targetDir, 0o755); err != nil {
-		return "", fmt.Errorf("create cache dir: %w", err)
-	}
-
-	archivePath := filepath.Join(m.cacheDir, fmt.Sprintf("actions-runner-%s.tar.gz", resolvedVersion))
-	if err := m.downloadRunner(ctx, resolvedVersion, archivePath); err != nil {
-		return "", err
-	}
-
-	if err := untar(archivePath, targetDir); err != nil {
-		return "", fmt.Errorf("untar: %w", err)
-	}
-
-	return targetDir, nil
-}
-
-func (m *Manager) downloadRunner(ctx context.Context, version, dest string) error {
-	url := runnerDownloadURL(version)
-	req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
-	if err != nil {
-		return fmt.Errorf("build request: %w", err)
-	}
-
-	resp, err := m.httpClient.Do(req)
-	if err != nil {
-		return fmt.Errorf("download runner: %w", err)
-	}
-	defer resp.Body.Close()
-
-	if resp.StatusCode >= 300 {
-		return fmt.Errorf("download runner failed: status %d", resp.StatusCode)
-	}
-
-	f, err := os.Create(dest)
-	if err != nil {
-		return fmt.Errorf("create archive: %w", err)
-	}
-	defer f.Close()
-
-	if _, err := io.Copy(f, resp.Body); err != nil {
-		return fmt.Errorf("write archive: %w", err)
-	}
-
-	return nil
-}
-
-func runnerDownloadURL(version string) string {
-	resolved := version
-	arch := "x64"
-	if runtime.GOARCH == "arm64" {
-		arch = "arm64"
-	}
-	return fmt.Sprintf("https://github.com/actions/runner/releases/download/v%s/actions-runner-osx-%s-%s.tar.gz", resolved, arch, resolved)
-}
-
-func (m *Manager) resolveVersion(ctx context.Context, version string) (string, error) {
-	if version != "latest" {
-		return version, nil
-	}
-
-	req, err := http.NewRequestWithContext(ctx, http.MethodGet, "https://api.github.com/repos/actions/runner/releases/latest", nil)
-	if err != nil {
-		return "", err
-	}
-	resp, err := m.httpClient.Do(req)
-	if err != nil {
-		return "", err
-	}
-	defer resp.Body.Close()
-
-	if resp.StatusCode >= 300 {
-		return "", fmt.Errorf("latest version lookup failed: status %d", resp.StatusCode)
-	}
-	var payload struct {
-		TagName string `json:"tag_name"`
-	}
-	if err := json.NewDecoder(resp.Body).Decode(&payload); err != nil {
-		return "", err
-	}
-	tag := strings.TrimPrefix(payload.TagName, "v")
-	if tag == "" {
-		return "", fmt.Errorf("empty tag from latest release")
-	}
-	return tag, nil
-}
-
-func runnerURL(gh config.GitHubConfig) string {
-	if gh.Scope == config.ScopeRepo {
-		return fmt.Sprintf("https://github.com/%s/%s", gh.Owner, gh.Repo)
-	}
-	return fmt.Sprintf("https://github.com/%s", gh.Owner)
-}
-
-func untar(src, dest string) error {
-	f, err := os.Open(src)
-	if err != nil {
-		return err
-	}
-	defer f.Close()
-
-	gzr, err := gzip.NewReader(f)
-	if err != nil {
-		return err
-	}
-	defer gzr.Close()
-
-	tr := tar.NewReader(gzr)
-	for {
-		header, err := tr.Next()
-		if errors.Is(err, io.EOF) {
-			break
-		}
-		if err != nil {
-			return err
-		}
-
-		targetPath := filepath.Join(dest, header.Name)
-
-		switch header.Typeflag {
-		case tar.TypeDir:
-			if err := os.MkdirAll(targetPath, os.FileMode(header.Mode)); err != nil {
-				return err
-			}
-		case tar.TypeReg:
-			if err := os.MkdirAll(filepath.Dir(targetPath), 0o755); err != nil {
-				return err
-			}
-			outFile, err := os.OpenFile(targetPath, os.O_CREATE|os.O_RDWR|os.O_TRUNC, os.FileMode(header.Mode))
-			if err != nil {
-				return err
-			}
-			if _, err := io.Copy(outFile, tr); err != nil {
-				_ = outFile.Close()
-				return err
-			}
-			_ = outFile.Close()
-		default:
-			continue
-		}
-	}
-	return nil
-}
-
-func copyDir(src, dst string) error {
-	return filepath.Walk(src, func(path string, info os.FileInfo, err error) error {
-		if err != nil {
-			return err
-		}
-		rel, err := filepath.Rel(src, path)
-		if err != nil {
-			return err
-		}
-		target := filepath.Join(dst, rel)
-
-		if info.IsDir() {
-			return os.MkdirAll(target, info.Mode())
-		}
-
-		if err := os.MkdirAll(filepath.Dir(target), 0o755); err != nil {
-			return err
-		}
-
-		srcFile, err := os.Open(path)
-		if err != nil {
-			return err
-		}
-		defer srcFile.Close()
-
-		dstFile, err := os.OpenFile(target, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, info.Mode())
-		if err != nil {
-			return err
-		}
-		defer dstFile.Close()
-
-		if _, err := io.Copy(dstFile, srcFile); err != nil {
-			return err
-		}
-
-		return nil
-	})
-}
-
-// NewRunnerInstance builds a runner instance descriptor with generated ID.
-func NewRunnerInstance(group domain.Group) domain.RunnerInstance {
-	id := randID()
-	return domain.RunnerInstance{
-		ID:        id,
-		GroupName: group.Name,
-		Ephemeral: group.Ephemeral,
-		Workdir:   filepath.Join(group.Workdir, id),
-		Labels:    group.Labels,
-		Version:   group.Version,
-	}
-}
-
-func randID() string {
-	var b [4]byte
-	_, _ = rand.Read(b[:])
-	return hex.EncodeToString(b[:])
-}
-
-func (m *Manager) writePID(workdir string, pid int) error {
-	pidPath := filepath.Join(workdir, pidFileName)
-	return os.WriteFile(pidPath, []byte(strconv.Itoa(pid)), 0o644)
-}
-
-// CleanupStale removes leftover runner workdirs and terminates stray runner processes in known bases.
-func (m *Manager) CleanupStale(bases []string) {
-	for _, base := range bases {
-		entries, err := os.ReadDir(base)
-		if err != nil {
-			m.logger.Printf("cleanup: skip base %s: %v", base, err)
-			continue
-		}
-		for _, entry := range entries {
-			if !entry.IsDir() {
-				continue
-			}
-			dir := filepath.Join(base, entry.Name())
-			pidPath := filepath.Join(dir, pidFileName)
-			if pidBytes, err := os.ReadFile(pidPath); err == nil {
-				if pid, err := strconv.Atoi(strings.TrimSpace(string(pidBytes))); err == nil {
-					if err := killPID(pid); err != nil {
-						m.logger.Printf("kill stale pid %d (%s): %v", pid, dir, err)
-					}
-				}
-			}
-			if err := os.RemoveAll(dir); err != nil {
-				m.logger.Printf("remove stale workdir %s: %v", dir, err)
-			}
-		}
-	}
-}
-
-func killPID(pid int) error {
-	if pid <= 0 {
-		return fmt.Errorf("invalid pid %d", pid)
-	}
-	proc, err := os.FindProcess(pid)
-	if err != nil {
-		return err
-	}
-	return proc.Kill()
-}
diff --git a/tests/complete/README.md b/tests/complete/README.md
new file mode 100644
index 0000000..ec93874
--- /dev/null
+++ b/tests/complete/README.md
@@ -0,0 +1,106 @@
+# Complete Test
+
+Full end-to-end test of all ghr v2 features with 4 groups, 20 jobs, and all edge cases.
+
+## What is tested
+
+### Scale set management
+- 4 scale sets created at startup
+- Scale sets deleted on shutdown (Ctrl+C)
+- Per-group health override (ghr-deploy: runner_timeout=10m)
+
+### Scaling behavior
+- Scale-up from 0 to max (ghr-heavy: 0 -> 2)
+- Pre-provisioned idle runner (ghr-fast: min=1, ghr-single: min=1)
+- Scale-up to max under load (ghr-fast: 1 -> 3)
+- Job queuing when max reached (ghr-fast 4th job waits)
+- Scale-down after job completion (ephemeral runners)
+- Second wave of jobs after first batch completes
+- Sequential enforcement with max=1 (ghr-deploy: 3 jobs one after another)
+- Always-on min=max=1 (ghr-single: runner always available)
+
+### Runner lifecycle
+- Runner provisioned (workdir copy, JIT config, process start)
+- Job started (idle -> busy transition)
+- Job completed success (stop, cleanup workdir)
+- Job completed failure (runner.failed event, cleanup still happens)
+- Instant job (fast provision/cleanup cycle)
+- Multi-step job (steps share runner)
+- High stdout output (100 lines of payload)
+
+### Health monitoring (check_interval=10s)
+- Runner liveness checks (kill -0 on PIDs)
+- Runner timeout detection (runner_timeout=5m, won't trigger in test)
+- Idle timeout (idle_timeout=2m, triggers on min_runners idle runners after all jobs done)
+- Disk space check (min_disk_space=500MB)
+- Health issues -> notification events
+
+### Notifications (Discord)
+- runner.failed event sent when edge-fail job fails
+- health.* events sent on any health issue
+- daemon.start / daemon.stop events
+
+### Monitoring (Uptime Kuma)
+- Daemon health push every check_interval (10s)
+- Per-group health push (4 groups, 4 push tokens)
+- Degraded threshold at 0.5
+
+### Logging
+- Daemon log: {log_dir}/daemon/{date}.json
+- Group logs: {log_dir}/groups/{group}/{date}.json (4 groups)
+- Runner logs: {log_dir}/groups/{group}/runners/{runner}/{date}.json
+- Console output in text format with debug level
+- Runner stdout captured in runner log files
+
+### Shutdown
+- Ctrl+C triggers graceful shutdown
+- All idle runners killed
+- All workdirs cleaned
+- All scale sets deleted
+- PID file removed
+- State file removed
+- Socket removed
+- No orphan processes
+
+## Setup
+
+1. Copy `env.example` to `.env` and fill in your values
+2. Edit `config.yaml` and set `github.url` to your org
+3. Run:
+
+```bash
+cd tests/complete
+cp env.example .env
+# Edit .env with your Discord webhook + Uptime Kuma URLs
+
+ghr run --config config.yaml --log-level debug
+```
+
+4. Copy `workflow.yml` to `.github/workflows/test-ghr-complete.yml` in your repo
+5. Trigger from GitHub Actions > "Run workflow"
+
+## Verification checklist
+
+After the workflow completes:
+
+- [ ] All 20 jobs completed in GitHub Actions (19 success, 1 failure)
+- [ ] ghr-fast scaled to 3 runners concurrently
+- [ ] ghr-heavy scaled to 2 runners concurrently
+- [ ] ghr-deploy ran 3 jobs sequentially (max=1)
+- [ ] ghr-single had pre-provisioned runner at startup
+- [ ] edge-fail shows `result=failed` in ghr logs
+- [ ] Discord received a notification for the failed job
+- [ ] Uptime Kuma shows pushes for daemon + 4 groups
+
+After Ctrl+C:
+
+- [ ] No runner processes remain (`ps aux | grep Runner.Listener`)
+- [ ] Workdirs empty (`ls ~/.local/share/ghr/runners/`)
+- [ ] No PID file (`ls ~/.local/state/ghr/daemon.pid`)
+- [ ] No socket (`ls ~/.local/state/ghr/ghr.sock`)
+- [ ] Log files exist with structured JSON entries
+
+After waiting 2+ minutes idle (before Ctrl+C):
+
+- [ ] Idle runners killed by health monitor (idle_timeout=2m)
+- [ ] min_runners runners re-provisioned after idle kill
diff --git a/tests/complete/config.yaml b/tests/complete/config.yaml
new file mode 100644
index 0000000..278936e
--- /dev/null
+++ b/tests/complete/config.yaml
@@ -0,0 +1,75 @@
+github:
+  url: "https://github.com/YOUR_ORG"
+  runner_group: "default"
+
+runner:
+  version: "latest"
+
+groups:
+  - name: "ghr-fast"
+    max_runners: 3
+    min_runners: 1
+    labels:
+      - "fast"
+      - "macos"
+
+  - name: "ghr-heavy"
+    max_runners: 2
+    min_runners: 0
+    labels:
+      - "heavy"
+      - "macos"
+
+  - name: "ghr-deploy"
+    max_runners: 1
+    min_runners: 0
+    labels:
+      - "deploy"
+      - "macos"
+    health:
+      runner_timeout: "10m"
+
+  - name: "ghr-single"
+    max_runners: 1
+    min_runners: 1
+    labels:
+      - "single"
+      - "macos"
+
+health:
+  enabled: true
+  check_interval: "10s"
+  runner_timeout: "5m"
+  idle_timeout: "2m"
+  divergence_timeout: "1m"
+  max_consecutive_failures: 3
+  failure_cooldown: "30s"
+  min_disk_space: "500MB"
+
+logging:
+  level: "debug"
+  format: "text"
+  retention_days: 7
+  runner_output: true
+
+notifications:
+  discord:
+    enabled: true
+    events:
+      - "health.*"
+      - "daemon.*"
+      - "runner.failed"
+      - "runner.timeout"
+    username: "ghr-test"
+    mentions:
+      error: ""
+      critical: ""
+
+monitoring:
+  uptime_kuma:
+    enabled: true
+    degraded_threshold: 0.5
+    report_health_as_ping: true
+
+daemon:
+  shutdown_timeout: "15s"
diff --git a/tests/complete/env.example b/tests/complete/env.example
new file mode 100644
index 0000000..bb35ef1
--- /dev/null
+++ b/tests/complete/env.example
@@ -0,0 +1,10 @@
+# Discord webhook — required when notifications.discord.enabled = true
+GHR_DISCORD_WEBHOOK_URL=https://discord.com/api/webhooks/XXXXXXXXXX/YYYYYYYY
+
+# Uptime Kuma — required when monitoring.uptime_kuma.enabled = true
+GHR_UPTIME_KUMA_URL=https://uptime.example.com
+GHR_UPTIME_KUMA_DAEMON_TOKEN=your-daemon-push-token
+GHR_UPTIME_KUMA_TOKEN_GHR_FAST=your-fast-group-token
+GHR_UPTIME_KUMA_TOKEN_GHR_HEAVY=your-heavy-group-token
+GHR_UPTIME_KUMA_TOKEN_GHR_DEPLOY=your-deploy-group-token
+GHR_UPTIME_KUMA_TOKEN_GHR_SINGLE=your-single-group-token
diff --git a/tests/complete/validate.sh b/tests/complete/validate.sh
new file mode 100755
index 0000000..fee7138
--- /dev/null
+++ b/tests/complete/validate.sh
@@ -0,0 +1,199 @@
+#!/bin/bash
+set -uo pipefail
+
+LOG_DIR="${GHR_LOG_DIR:-$HOME/.local/share/ghr/logs}"
+STATE_DIR="${GHR_STATE_DIR:-$HOME/.local/state/ghr}"
+RUNNER_DIR="${GHR_RUNNER_DIR:-$HOME/.local/share/ghr/runners}"
+PASS=0
+FAIL=0
+WARN=0
+
+pass() { PASS=$((PASS + 1)); printf "  \033[32m✓\033[0m %s\n" "$1"; }
+fail() { FAIL=$((FAIL + 1)); printf "  \033[31m✗\033[0m %s\n" "$1"; }
+warn() { WARN=$((WARN + 1)); printf "  \033[33m!\033[0m %s\n" "$1"; }
+section() { printf "\n\033[1m%s\033[0m\n" "$1"; }
+
+TODAY=$(date +%Y-%m-%d)
+DAEMON_LOG="$LOG_DIR/daemon/$TODAY.json"
+
+if [ ! -f "$DAEMON_LOG" ]; then
+    echo "ERROR: Daemon log not found at $DAEMON_LOG"
+    echo "Set GHR_LOG_DIR if logs are elsewhere."
+    exit 1
+fi
+
+section "=== Scale Set Management ==="
+
+GROUPS_STARTED=$(grep -c '"group listener started"' "$DAEMON_LOG" 2>/dev/null || echo 0)
+if [ "$GROUPS_STARTED" -ge 4 ]; then pass "4 groups started ($GROUPS_STARTED listeners)"
+else fail "Expected 4 groups, got $GROUPS_STARTED"; fi
+
+for g in ghr-fast ghr-heavy ghr-deploy ghr-single; do
+    if grep -q "\"group\":\"$g\"" "$DAEMON_LOG" 2>/dev/null; then
+        pass "Group $g active"
+    else
+        fail "Group $g not found in logs"
+    fi
+done
+
+section "=== Runner Provisioning ==="
+
+TOTAL_PROVISIONED=$(grep -c '"runner provisioned"' "$DAEMON_LOG" 2>/dev/null || echo 0)
+pass "Total runners provisioned: $TOTAL_PROVISIONED"
+
+for g in ghr-fast ghr-heavy ghr-deploy ghr-single; do
+    GROUP_LOG="$LOG_DIR/groups/$g/$TODAY.json"
+    if [ -f "$GROUP_LOG" ]; then
+        COUNT=$(grep -c '"runner provisioned"' "$GROUP_LOG" 2>/dev/null || echo 0)
+        pass "  $g: $COUNT runners provisioned"
+    else
+        fail "  $g: no group log found"
+    fi
+done
+
+FAST_PROVISIONED=$(grep '"runner provisioned"' "$DAEMON_LOG" 2>/dev/null | grep -c '"group":"ghr-fast"' || echo 0)
+if [ "$FAST_PROVISIONED" -ge 3 ]; then pass "ghr-fast scaled to 3+ runners"
+else fail "ghr-fast only scaled to $FAST_PROVISIONED (expected >=3)"; fi
+
+section "=== Min Runners (Pre-provisioned) ==="
+
+DAEMON_START=$(grep '"ghr starting"' "$DAEMON_LOG" | head -1 | jq -r '.time' 2>/dev/null || echo "")
+FIRST_LISTENER=$(grep '"group listener started"' "$DAEMON_LOG" | head -1 | jq -r '.time' 2>/dev/null || echo "")
+
+FAST_FIRST=$(grep '"runner provisioned"' "$DAEMON_LOG" | grep '"group":"ghr-fast"' | head -1 | jq -r '.time' 2>/dev/null || echo "")
+FAST_FIRST_JOB=$(grep '"job started"' "$DAEMON_LOG" | grep '"group":"ghr-fast"' | head -1 | jq -r '.time' 2>/dev/null || echo "")
+
+if [ -n "$FAST_FIRST" ] && [ -n "$FAST_FIRST_JOB" ]; then
+    if [[ "$FAST_FIRST" < "$FAST_FIRST_JOB" ]]; then
+        pass "ghr-fast: runner provisioned BEFORE first job (min_runners=1)"
+    else
+        fail "ghr-fast: runner provisioned AFTER first job"
+    fi
+else
+    warn "Cannot determine min_runners timing"
+fi
+
+section "=== Job Execution ==="
+
+JOBS_STARTED=$(grep -c '"job started"' "$DAEMON_LOG" 2>/dev/null || echo 0)
+JOBS_COMPLETED=$(grep -c '"job completed"' "$DAEMON_LOG" 2>/dev/null || echo 0)
+JOBS_SUCCEEDED=$(grep '"job completed"' "$DAEMON_LOG" 2>/dev/null | grep -c '"result":"succeeded"' || echo 0)
+JOBS_FAILED=$(grep '"job completed"' "$DAEMON_LOG" 2>/dev/null | grep -c '"result":"failed"' || echo 0)
+
+pass "Jobs started: $JOBS_STARTED"
+pass "Jobs completed: $JOBS_COMPLETED"
+pass "  Succeeded: $JOBS_SUCCEEDED"
+pass "  Failed: $JOBS_FAILED"
+
+if [ "$JOBS_COMPLETED" -ge 18 ]; then pass "Enough jobs completed (>= 18)"
+else fail "Only $JOBS_COMPLETED jobs completed (expected >= 18)"; fi
+
+if [ "$JOBS_FAILED" -ge 1 ]; then pass "At least 1 failed job detected (edge-fail)"
+else fail "No failed job detected"; fi
+
+section "=== Concurrency ==="
+
+FAST_LOG="$LOG_DIR/groups/ghr-fast/$TODAY.json"
+if [ -f "$FAST_LOG" ]; then
+    CONCURRENT=$(grep '"runner provisioned"' "$FAST_LOG" | head -3 | jq -r '.time[:19]' 2>/dev/null | sort -u | wc -l | tr -d ' ')
+    if [ "$CONCURRENT" -le 2 ]; then
+        pass "ghr-fast: 3 runners provisioned within same time window"
+    else
+        warn "ghr-fast: runners provisioned across $CONCURRENT distinct seconds"
+    fi
+fi
+
+HEAVY_LOG="$LOG_DIR/groups/ghr-heavy/$TODAY.json"
+if [ -f "$HEAVY_LOG" ]; then
+    HEAVY_PROV=$(grep -c '"runner provisioned"' "$HEAVY_LOG" 2>/dev/null || echo 0)
+    if [ "$HEAVY_PROV" -ge 2 ]; then pass "ghr-heavy: scaled to 2 runners"
+    else fail "ghr-heavy: only $HEAVY_PROV runners (expected >=2)"; fi
+fi
+
+section "=== Sequential Enforcement (ghr-deploy max=1) ==="
+
+DEPLOY_LOG="$LOG_DIR/groups/ghr-deploy/$TODAY.json"
+if [ -f "$DEPLOY_LOG" ]; then
+    DEPLOY_JOBS=$(grep -c '"job started"' "$DEPLOY_LOG" 2>/dev/null || echo 0)
+    DEPLOY_RUNNERS=$(grep '"runner provisioned"' "$DEPLOY_LOG" | jq -r '.runner' 2>/dev/null | sort -u | wc -l | tr -d ' ')
+    pass "ghr-deploy: $DEPLOY_JOBS jobs across $DEPLOY_RUNNERS unique runners"
+    if [ "$DEPLOY_JOBS" -ge 3 ]; then pass "ghr-deploy: all 3 deploy jobs ran"
+    else fail "ghr-deploy: only $DEPLOY_JOBS jobs (expected 3)"; fi
+fi
+
+section "=== Job Failure Handling ==="
+
+FAILED_RUNNER=$(grep '"job completed"' "$DAEMON_LOG" | grep '"result":"failed"' | head -1 | jq -r '.runner' 2>/dev/null || echo "")
+if [ -n "$FAILED_RUNNER" ]; then
+    pass "Failed job runner identified: $FAILED_RUNNER"
+    if grep -q "\"runner\":\"$FAILED_RUNNER\".*stopping" "$DAEMON_LOG" 2>/dev/null || \
+       grep -q "stopping.*\"runner\":\"$FAILED_RUNNER\"" "$DAEMON_LOG" 2>/dev/null; then
+        pass "Failed runner was stopped and cleaned"
+    else
+        warn "Cannot confirm failed runner cleanup in logs"
+    fi
+else
+    fail "No failed job runner found"
+fi
+
+section "=== Runner Log Files ==="
+
+RUNNER_LOG_COUNT=$(find "$LOG_DIR/groups" -path "*/runners/*/$TODAY.json" -type f 2>/dev/null | wc -l | tr -d ' ')
+pass "Runner log files created: $RUNNER_LOG_COUNT"
+
+for g in ghr-fast ghr-heavy ghr-deploy ghr-single; do
+    GROUP_RUNNERS=$(find "$LOG_DIR/groups/$g/runners" -name "$TODAY.json" -type f 2>/dev/null | wc -l | tr -d ' ')
+    pass "  $g: $GROUP_RUNNERS runner logs"
+done
+
+section "=== Duration Stats ==="
+
+if grep -q '"duration"' "$DAEMON_LOG" 2>/dev/null; then
+    pass "Job durations logged"
+    echo "    Durations:"
+    grep '"job completed"' "$DAEMON_LOG" | jq -r '  "    " + .runner + ": " + (.duration // "n/a")' 2>/dev/null | head -10
+else
+    warn "No duration data in logs"
+fi
+
+section "=== Cleanup State ==="
+
+ORPHAN_PROCS=$(pgrep -f "Runner.Listener" 2>/dev/null | wc -l | tr -d ' ')
+if [ "$ORPHAN_PROCS" -eq 0 ]; then pass "No orphan runner processes"
+else fail "$ORPHAN_PROCS orphan processes found"; fi
+
+WORKDIR_CONTENT=$(find "$RUNNER_DIR" -mindepth 2 -maxdepth 2 -type d 2>/dev/null | wc -l | tr -d ' ')
+if [ "$WORKDIR_CONTENT" -eq 0 ]; then pass "All runner workdirs cleaned"
+else fail "$WORKDIR_CONTENT workdirs remain"; fi
+
+if [ ! -f "$STATE_DIR/daemon.pid" ]; then pass "PID file removed"
+else fail "PID file still exists"; fi
+
+if [ ! -S "$STATE_DIR/ghr.sock" ]; then pass "Socket removed"
+else fail "Socket still exists"; fi
+
+section "=== Log Structure ==="
+
+if [ -f "$LOG_DIR/daemon/$TODAY.json" ]; then pass "Daemon log exists"
+else fail "Daemon log missing"; fi
+
+for g in ghr-fast ghr-heavy ghr-deploy ghr-single; do
+    if [ -f "$LOG_DIR/groups/$g/$TODAY.json" ]; then pass "Group log $g exists"
+    else fail "Group log $g missing"; fi
+done
+
+DAEMON_LINES=$(wc -l < "$DAEMON_LOG" | tr -d ' ')
+pass "Daemon log entries: $DAEMON_LINES"
+
+section "=== Notifications ==="
+
+NOTIF_EVENTS=$(grep '"runner.failed"\|"runner.started"\|"daemon.start"' "$DAEMON_LOG" 2>/dev/null | wc -l | tr -d ' ')
+if [ "$NOTIF_EVENTS" -ge 1 ]; then pass "Notification events emitted: $NOTIF_EVENTS"
+else warn "No notification events found in daemon log"; fi
+
+section "========================================="
+printf "\033[1m  Results: \033[32m%d passed\033[0m, \033[31m%d failed\033[0m, \033[33m%d warnings\033[0m\n" "$PASS" "$FAIL" "$WARN"
+section "========================================="
+
+if [ "$FAIL" -gt 0 ]; then exit 1; fi
+exit 0
diff --git a/tests/complete/workflow.yml b/tests/complete/workflow.yml
new file mode 100644
index 0000000..d2efe36
--- /dev/null
+++ b/tests/complete/workflow.yml
@@ -0,0 +1,548 @@
+name: ghr v2 complete test
+on:
+  workflow_dispatch:
+    inputs:
+      stress_level:
+        description: "Number of parallel stress jobs per group"
+        default: "3"
+        type: choice
+        options: ["2", "3", "5"]
+
+jobs:
+
+  # =============================================
+  # PHASE 1: STARTUP VALIDATION
+  # Runs immediately — checks min_runners pre-provisioning
+  # =============================================
+
+  startup-fast:
+    runs-on: ghr-fast
+    steps:
+      - run: |
+          echo "Runner: $RUNNER_NAME"
+          echo "This runner should already exist (min_runners=1)"
+          echo "Startup time: $(date -u +%H:%M:%S)"
+
+  startup-single:
+    runs-on: ghr-single
+    steps:
+      - run: |
+          echo "Runner: $RUNNER_NAME"
+          echo "This runner should already exist (min_runners=1)"
+          echo "Startup time: $(date -u +%H:%M:%S)"
+
+  # =============================================
+  # PHASE 2: CONCURRENT SCALE-UP
+  # Hit max_runners on each group simultaneously
+  # =============================================
+
+  fast-burst-1:
+    runs-on: ghr-fast
+    needs: [startup-fast]
+    steps:
+      - run: |
+          echo "fast-burst-1 | Runner: $RUNNER_NAME | $(date -u +%H:%M:%S)"
+          sleep 30
+
+  fast-burst-2:
+    runs-on: ghr-fast
+    needs: [startup-fast]
+    steps:
+      - run: |
+          echo "fast-burst-2 | Runner: $RUNNER_NAME | $(date -u +%H:%M:%S)"
+          sleep 30
+
+  fast-burst-3:
+    runs-on: ghr-fast
+    needs: [startup-fast]
+    steps:
+      - run: |
+          echo "fast-burst-3 | Runner: $RUNNER_NAME | $(date -u +%H:%M:%S)"
+          sleep 30
+
+  heavy-burst-1:
+    runs-on: ghr-heavy
+    needs: [startup-fast]
+    steps:
+      - run: |
+          echo "heavy-burst-1 | Runner: $RUNNER_NAME | $(date -u +%H:%M:%S)"
+          sleep 40
+
+  heavy-burst-2:
+    runs-on: ghr-heavy
+    needs: [startup-fast]
+    steps:
+      - run: |
+          echo "heavy-burst-2 | Runner: $RUNNER_NAME | $(date -u +%H:%M:%S)"
+          sleep 40
+
+  # =============================================
+  # PHASE 3: QUEUING PRESSURE
+  # More jobs than max_runners — forces queuing
+  # =============================================
+
+  fast-queue-1:
+    runs-on: ghr-fast
+    needs: [startup-fast]
+    steps:
+      - run: |
+          echo "fast-queue-1 (may be queued) | Runner: $RUNNER_NAME"
+          sleep 20
+
+  fast-queue-2:
+    runs-on: ghr-fast
+    needs: [startup-fast]
+    steps:
+      - run: |
+          echo "fast-queue-2 (may be queued) | Runner: $RUNNER_NAME"
+          sleep 20
+
+  fast-queue-3:
+    runs-on: ghr-fast
+    needs: [startup-fast]
+    steps:
+      - run: |
+          echo "fast-queue-3 (may be queued) | Runner: $RUNNER_NAME"
+          sleep 20
+
+  heavy-queue-1:
+    runs-on: ghr-heavy
+    needs: [startup-fast]
+    steps:
+      - run: |
+          echo "heavy-queue-1 (may be queued) | Runner: $RUNNER_NAME"
+          sleep 25
+
+  # =============================================
+  # PHASE 4: REAL WORKLOADS
+  # Simulate actual CI tasks
+  # =============================================
+
+  real-checkout:
+    runs-on: ghr-fast
+    needs: [fast-burst-1]
+    steps:
+      - uses: actions/checkout@v4
+      - run: |
+          echo "=== Checkout completed ==="
+          echo "Files: $(find . -type f | wc -l)"
+          echo "Disk: $(du -sh .)"
+          ls -la
+
+  real-build:
+    runs-on: ghr-heavy
+    needs: [heavy-burst-1]
+    steps:
+      - run: |
+          echo "=== Simulating real build ==="
+          mkdir -p build/output
+          for i in $(seq 1 20); do
+            dd if=/dev/urandom bs=1024 count=100 of=build/output/artifact-$i.bin 2>/dev/null
+            echo "Built artifact $i/20"
+          done
+          echo "Total size: $(du -sh build/)"
+          echo "Disk free: $(df -h / | tail -1)"
+
+  real-test-matrix:
+    runs-on: ghr-fast
+    needs: [fast-burst-2]
+    strategy:
+      matrix:
+        test-suite: [unit, integration, e2e]
+      fail-fast: false
+    steps:
+      - run: |
+          echo "=== Test suite: ${{ matrix.test-suite }} ==="
+          echo "Runner: $RUNNER_NAME"
+          case "${{ matrix.test-suite }}" in
+            unit)        sleep 10; echo "47 tests passed" ;;
+            integration) sleep 15; echo "23 tests passed" ;;
+            e2e)         sleep 20; echo "8 tests passed" ;;
+          esac
+
+  real-cpu-stress:
+    runs-on: ghr-heavy
+    needs: [heavy-burst-2]
+    steps:
+      - run: |
+          echo "=== CPU stress test ==="
+          echo "Runner: $RUNNER_NAME"
+          echo "Cores: $(sysctl -n hw.ncpu)"
+          echo "Starting prime calculation..."
+          start=$(date +%s)
+          python3 -c "
+          import math
+          primes = []
+          for n in range(2, 50000):
+              if all(n % p != 0 for p in primes):
+                  primes.append(n)
+          print(f'Found {len(primes)} primes up to 50000')
+          "
+          end=$(date +%s)
+          echo "Duration: $((end - start))s"
+
+  real-disk-io:
+    runs-on: ghr-heavy
+    needs: [real-build]
+    steps:
+      - run: |
+          echo "=== Disk I/O test ==="
+          echo "Runner: $RUNNER_NAME"
+          mkdir -p /tmp/ghr-io-test
+          echo "Writing 100MB..."
+          dd if=/dev/zero bs=1M count=100 of=/tmp/ghr-io-test/testfile 2>&1
+          echo "Reading back..."
+          dd if=/tmp/ghr-io-test/testfile of=/dev/null bs=1M 2>&1
+          rm -rf /tmp/ghr-io-test
+          echo "Disk free after cleanup: $(df -h / | tail -1)"
+
+  real-network:
+    runs-on: ghr-fast
+    needs: [fast-burst-3]
+    steps:
+      - run: |
+          echo "=== Network test ==="
+          echo "Runner: $RUNNER_NAME"
+          echo "DNS resolution..."
+          nslookup github.com
+          echo "HTTP request..."
+          curl -s -o /dev/null -w "Status: %{http_code}\nTime: %{time_total}s\nSize: %{size_download} bytes\n" https://api.github.com
+          echo "IP: $(curl -s ifconfig.me)"
+
+  real-env-check:
+    runs-on: ghr-single
+    needs: [startup-single]
+    steps:
+      - run: |
+          echo "=== Environment check ==="
+          echo "Runner: $RUNNER_NAME"
+          echo "OS: $(sw_vers -productName) $(sw_vers -productVersion)"
+          echo "Arch: $(uname -m)"
+          echo "Shell: $SHELL"
+          echo "User: $(whoami)"
+          echo "Home: $HOME"
+          echo "Cores: $(sysctl -n hw.ncpu)"
+          echo "Memory: $(sysctl -n hw.memsize | awk '{print $1/1024/1024/1024 " GB"}')"
+          echo "Disk: $(df -h / | tail -1)"
+          echo "Go: $(go version 2>/dev/null || echo 'not installed')"
+          echo "Python: $(python3 --version 2>/dev/null || echo 'not installed')"
+          echo "Node: $(node --version 2>/dev/null || echo 'not installed')"
+          echo "Git: $(git --version)"
+
+  # =============================================
+  # PHASE 5: ERROR HANDLING
+  # Various failure modes
+  # =============================================
+
+  error-exit-1:
+    runs-on: ghr-fast
+    needs: [fast-queue-1]
+    steps:
+      - run: echo "About to fail with exit 1"
+      - run: exit 1
+
+  error-exit-2:
+    runs-on: ghr-fast
+    needs: [fast-queue-2]
+    steps:
+      - run: exit 2
+
+  error-bad-command:
+    runs-on: ghr-fast
+    needs: [fast-queue-3]
+    steps:
+      - run: this-command-does-not-exist-at-all
+        continue-on-error: true
+      - run: echo "Continued after bad command"
+
+  error-timeout:
+    runs-on: ghr-fast
+    needs: [error-exit-1]
+    if: always()
+    timeout-minutes: 1
+    steps:
+      - run: |
+          echo "This job has a 1 minute timeout"
+          echo "Sleeping 45s (under the limit)..."
+          sleep 45
+          echo "Finished before timeout"
+
+  error-recovery:
+    runs-on: ghr-fast
+    needs: [error-exit-1, error-exit-2]
+    if: always()
+    steps:
+      - run: |
+          echo "=== Recovery after failures ==="
+          echo "Runner: $RUNNER_NAME"
+          echo "This proves the group still works after failed jobs"
+
+  # =============================================
+  # PHASE 6: SEQUENTIAL PIPELINE (ghr-deploy)
+  # Strict ordering, max=1 enforcement
+  # =============================================
+
+  deploy-validate:
+    runs-on: ghr-deploy
+    needs: [real-build, real-test-matrix]
+    steps:
+      - run: |
+          echo "=== Deploy: validation ==="
+          echo "Runner: $RUNNER_NAME | $(date -u +%H:%M:%S)"
+          sleep 10
+          echo "Validation passed"
+
+  deploy-staging:
+    runs-on: ghr-deploy
+    needs: [deploy-validate]
+    steps:
+      - run: |
+          echo "=== Deploy: staging ==="
+          echo "Runner: $RUNNER_NAME | $(date -u +%H:%M:%S)"
+          sleep 15
+          echo "Staging deployed"
+
+  deploy-smoke-test:
+    runs-on: ghr-deploy
+    needs: [deploy-staging]
+    steps:
+      - run: |
+          echo "=== Deploy: smoke test ==="
+          echo "Runner: $RUNNER_NAME | $(date -u +%H:%M:%S)"
+          sleep 5
+          echo "Smoke test passed"
+
+  deploy-production:
+    runs-on: ghr-deploy
+    needs: [deploy-smoke-test]
+    steps:
+      - run: |
+          echo "=== Deploy: production ==="
+          echo "Runner: $RUNNER_NAME | $(date -u +%H:%M:%S)"
+          sleep 15
+          echo "Production deployed"
+
+  deploy-verify:
+    runs-on: ghr-deploy
+    needs: [deploy-production]
+    steps:
+      - run: |
+          echo "=== Deploy: verification ==="
+          echo "Runner: $RUNNER_NAME | $(date -u +%H:%M:%S)"
+          sleep 5
+          echo "Production verified"
+
+  # =============================================
+  # PHASE 7: SECOND WAVE
+  # After first batch completes — tests scale-down then scale-up
+  # =============================================
+
+  wave2-fast-1:
+    runs-on: ghr-fast
+    needs: [error-recovery, real-network]
+    steps:
+      - run: |
+          echo "=== Wave 2 fast-1 ==="
+          echo "Runner: $RUNNER_NAME"
+          echo "Runners should have scaled down then back up"
+          sleep 15
+
+  wave2-fast-2:
+    runs-on: ghr-fast
+    needs: [error-recovery, real-network]
+    steps:
+      - run: |
+          echo "=== Wave 2 fast-2 ==="
+          echo "Runner: $RUNNER_NAME"
+          sleep 15
+
+  wave2-fast-3:
+    runs-on: ghr-fast
+    needs: [error-recovery, real-network]
+    steps:
+      - run: |
+          echo "=== Wave 2 fast-3 ==="
+          echo "Runner: $RUNNER_NAME"
+          sleep 15
+
+  wave2-heavy:
+    runs-on: ghr-heavy
+    needs: [real-disk-io, real-cpu-stress]
+    steps:
+      - run: |
+          echo "=== Wave 2 heavy ==="
+          echo "Runner: $RUNNER_NAME"
+          sleep 20
+
+  wave2-single-1:
+    runs-on: ghr-single
+    needs: [real-env-check]
+    steps:
+      - run: |
+          echo "=== Wave 2 single-1 ==="
+          echo "Runner: $RUNNER_NAME"
+          sleep 10
+
+  wave2-single-2:
+    runs-on: ghr-single
+    needs: [wave2-single-1]
+    steps:
+      - run: |
+          echo "=== Wave 2 single-2 ==="
+          echo "Runner: $RUNNER_NAME"
+          sleep 10
+
+  # =============================================
+  # PHASE 8: RAPID FIRE
+  # Many instant jobs to stress provisioning/cleanup
+  # =============================================
+
+  rapid-1:
+    runs-on: ghr-fast
+    needs: [wave2-fast-1]
+    steps:
+      - run: echo "rapid-1 | $RUNNER_NAME"
+
+  rapid-2:
+    runs-on: ghr-fast
+    needs: [wave2-fast-1]
+    steps:
+      - run: echo "rapid-2 | $RUNNER_NAME"
+
+  rapid-3:
+    runs-on: ghr-fast
+    needs: [wave2-fast-1]
+    steps:
+      - run: echo "rapid-3 | $RUNNER_NAME"
+
+  rapid-4:
+    runs-on: ghr-fast
+    needs: [rapid-1]
+    steps:
+      - run: echo "rapid-4 | $RUNNER_NAME"
+
+  rapid-5:
+    runs-on: ghr-fast
+    needs: [rapid-2]
+    steps:
+      - run: echo "rapid-5 | $RUNNER_NAME"
+
+  rapid-6:
+    runs-on: ghr-fast
+    needs: [rapid-3]
+    steps:
+      - run: echo "rapid-6 | $RUNNER_NAME"
+
+  # =============================================
+  # PHASE 9: LONG RUNNING
+  # Tests that runners survive for longer periods
+  # =============================================
+
+  long-running:
+    runs-on: ghr-heavy
+    needs: [wave2-heavy]
+    steps:
+      - run: |
+          echo "=== Long running job ==="
+          echo "Runner: $RUNNER_NAME"
+          echo "Start: $(date -u +%H:%M:%S)"
+          for i in $(seq 1 12); do
+            echo "Minute $i/12: $(date -u +%H:%M:%S) | Memory: $(vm_stat | head -5 | tail -1)"
+            sleep 10
+          done
+          echo "End: $(date -u +%H:%M:%S)"
+          echo "Total: ~2 minutes"
+
+  # =============================================
+  # PHASE 10: CROSS-GROUP JOB OUTPUTS
+  # Tests data passing between jobs on different groups
+  # =============================================
+
+  output-producer:
+    runs-on: ghr-heavy
+    needs: [long-running]
+    outputs:
+      build_id: ${{ steps.gen.outputs.build_id }}
+      timestamp: ${{ steps.gen.outputs.timestamp }}
+    steps:
+      - id: gen
+        run: |
+          BUILD_ID="build-$(date +%s)-$(openssl rand -hex 4)"
+          echo "build_id=$BUILD_ID" >> $GITHUB_OUTPUT
+          echo "timestamp=$(date -u +%Y-%m-%dT%H:%M:%SZ)" >> $GITHUB_OUTPUT
+          echo "Generated: $BUILD_ID"
+
+  output-consumer-fast:
+    runs-on: ghr-fast
+    needs: [output-producer]
+    steps:
+      - run: |
+          echo "=== Cross-group output ==="
+          echo "Build ID from heavy group: ${{ needs.output-producer.outputs.build_id }}"
+          echo "Timestamp: ${{ needs.output-producer.outputs.timestamp }}"
+
+  output-consumer-deploy:
+    runs-on: ghr-deploy
+    needs: [output-producer, deploy-verify]
+    steps:
+      - run: |
+          echo "=== Deploy with build ID ==="
+          echo "Deploying build: ${{ needs.output-producer.outputs.build_id }}"
+          sleep 5
+
+  # =============================================
+  # SUMMARY
+  # =============================================
+
+  summary:
+    runs-on: ghr-fast
+    needs:
+      - wave2-fast-2
+      - wave2-fast-3
+      - wave2-single-2
+      - rapid-4
+      - rapid-5
+      - rapid-6
+      - error-timeout
+      - error-bad-command
+      - output-consumer-fast
+      - output-consumer-deploy
+      - long-running
+    if: always()
+    steps:
+      - run: |
+          echo "========================================="
+          echo "  ghr v2 complete test — SUMMARY"
+          echo "========================================="
+          echo ""
+          echo "Time: $(date -u)"
+          echo "Runner: $RUNNER_NAME | Host: $(hostname)"
+          echo ""
+          echo "Groups exercised:"
+          echo "  ghr-fast   (max=3, min=1) — burst, queue, matrix, rapid fire"
+          echo "  ghr-heavy  (max=2, min=0) — build, CPU, disk I/O, long run"
+          echo "  ghr-deploy (max=1, min=0) — 5-stage pipeline, sequential"
+          echo "  ghr-single (max=1, min=1) — always-on, env check"
+          echo ""
+          echo "Scenarios tested:"
+          echo "  [Scale]     Pre-provisioned min_runners"
+          echo "  [Scale]     Burst to max_runners"
+          echo "  [Scale]     Queuing under pressure"
+          echo "  [Scale]     Scale-down between waves"
+          echo "  [Scale]     Scale-up on second wave"
+          echo "  [Work]      Git checkout"
+          echo "  [Work]      File I/O (100MB write/read)"
+          echo "  [Work]      CPU stress (prime sieve)"
+          echo "  [Work]      Network (DNS + HTTP)"
+          echo "  [Work]      Matrix strategy (3 suites)"
+          echo "  [Work]      Cross-group job outputs"
+          echo "  [Work]      Long running (2 min)"
+          echo "  [Error]     exit 1 / exit 2"
+          echo "  [Error]     Bad command + continue-on-error"
+          echo "  [Error]     Job timeout (1 min limit)"
+          echo "  [Error]     Recovery after failures"
+          echo "  [Pipeline]  5-stage deploy (validate→staging→smoke→prod→verify)"
+          echo "  [Rapid]     6 instant jobs back-to-back"
+          echo "  [Env]       System info (OS, arch, memory, disk)"
+          echo ""
+          echo "========================================="
diff --git a/tests/simple/README.md b/tests/simple/README.md
new file mode 100644
index 0000000..8d5c6ee
--- /dev/null
+++ b/tests/simple/README.md
@@ -0,0 +1,22 @@
+# Simple Test
+
+Minimal test: 1 group, 1 runner, 1 job.
+
+## Setup
+
+```bash
+ghr login
+ghr run --config tests/simple/config.yaml
+```
+
+## Trigger
+
+Copy `workflow.yml` to `.github/workflows/test-simple.yml` in your repo.
+Run it from GitHub Actions > "Run workflow".
+
+## Expected
+
+- 1 scale set created
+- 1 runner provisioned on job dispatch
+- Job completes, runner cleaned up
+- Ctrl+C stops cleanly
diff --git a/tests/simple/config.yaml b/tests/simple/config.yaml
new file mode 100644
index 0000000..23dc600
--- /dev/null
+++ b/tests/simple/config.yaml
@@ -0,0 +1,3 @@
+groups:
+  - name: "test-simple"
+    max_runners: 1
diff --git a/tests/simple/workflow.yml b/tests/simple/workflow.yml
new file mode 100644
index 0000000..9569eef
--- /dev/null
+++ b/tests/simple/workflow.yml
@@ -0,0 +1,8 @@
+name: ghr simple test
+on: workflow_dispatch
+
+jobs:
+  hello:
+    runs-on: test-simple
+    steps:
+      - run: echo "Hello from ghr!"