Skip to content

beejak/GHOST-PoC

Repository files navigation

GHOST

Grid Homeostasis & Orchestrated Self-healing Topology

Autonomous detection and remediation for container-style failures — proven on a deterministic, audit-friendly loop.

Executive snapshot: External replay recall improved from 6/11 to 11/11 on the same captured lab run, while holding 0 false positives.

CI License Python Dependencies Phase

Repository · Specification · Help & FAQ · Governance template · Quick start


Progress snapshot (front and center)

What has been delivered so far, with published evidence:

  • Core deterministic loop is operational and validated across five harness experiments.
  • Optional local Kubernetes lab pipeline is operational (bootstrap -> inject -> collect/normalize/replay).
  • Published external replay report shows concrete improvement:
    • before normalization refinement: 6/11 detected/resolved, 0 false positives
    • after normalization refinement: 11/11 detected/resolved, 0 false positives
  • Full report: docs/LAB_RUN_REPORT_20260331.md

Data flow: where we started -> where we are now

flowchart LR
  subgraph then["Start (initial PoC scope)"]
    A1[data/seed.py synthetic JSON]
    A2[Watcher -> Healer]
    A3[harness.py experiments 1-3]
    A4[metrics/results.db]
    A1 --> A2 --> A3 --> A4
  end

  subgraph now["Current state (expanded scope)"]
    B1[seed + k8s_clean_signals + near_real_stream]
    B2[harness.py experiments 1-5 + integrations/validate.py gate]
    B3[Optional lab pipeline: bootstrap -> inject -> collect]
    B4[tools/normalize_external_capture.py]
    B5[external replay scoring]
    B6[published report + metrics]
    B1 --> B2 --> B6
    B3 --> B4 --> B5 --> B6
  end
Loading

Table of contents


Overview

GHOST is a reference implementation of a closed control loop:

log signal → structured event → policy lookup → corrective action → measured outcome

It targets workload-agnostic container runtime failure modes (OOM-style kills, crash loops, probe failures, latency thresholds) using explicit patterns and decision tables — not an LLM and not a third-party agent framework. Phase 1 runs entirely on your machine: synthetic logs, an in-memory service model, and a reproducible harness with SQLite metrics.

Capability Phase 1
Real cloud / cluster APIs No (simulated state)
LLM reasoning No (deterministic matching)
External Python packages No (standard library)
Repeatable experiment suite Yes (harness.py — five experiments + SQLite metrics)
Policy separated from agent code Yes (skills/ modules)
Integration contract gate Yes (integrations/validate.py at start of harness.py)
Local file adapters (observe / lab) Optional (adapters/ — not in CI)

Why GHOST exists

Containers fail when operators are not staring at dashboards. Logs often already contain the diagnosis; runbooks describe the fix. The weak link is frequently the latency and variance of the human chain: page → wake → context switch → manual execution.

GHOST answers one precise question from our engineering specification:

Can a lightweight system detect a known container runtime failure from a log stream and execute the correct corrective action faster and more reliably than a human — with zero human input after start?

We care because MTTR under automation is measurable. This repository isolates the autonomous loop so we can prove behavior and regression-test it before attaching real infrastructure, identity systems, or richer reasoning layers.


What we built

Concretely, this repository delivers:

Layer Implementation
Detection policy skills/watcher_skills.py — substring sets per failure type, watched severities, event schema, explicit CANNOT_DO boundaries.
Remediation policy skills/healer_skills.py — decision table (failure_type → action, params), timeouts, default unknown handler, outcome schema.
Watcher agent agents/watcher.py — imports patterns only from watcher skills; emits validated events on ERROR / WARNING lines.
K8s signal policy skills/k8s_signal_skills.py — ordered declarative rules on a signal object (record_type, phase, reason, etc.).
K8s signal agent agents/k8s_watcher.py — imports only k8s_signal_skills; same event envelope as the log Watcher so the Healer stays unified.
Healer agent agents/healer.py — imports the decision table only from healer skills; executes registered actions against shared state.
Event fabric blackboard/event_bus.pyasyncio.Queue with schema validation (typed handoff between agents).
Simulated platform simulator/infra_state.pyapp-service baseline dict; container actions plus K8s-shaped fields (image, replicas_*, scheduling_blocked, node_ready) and matching heal actions.
Synthetic data data/seed.py — log datasets, k8s_clean_signals.json, plus near_real_stream.json (200 noisy multi-line / kube-prefixed lines, 20 failures); outputs are gitignored.
Streaming data/generator.py — async replay of JSON records for experiments.
Experiments run_experiment1.pyrun_experiment5.py — through mixed stream, K8s signals, and near-real noisy stream stress test.
Adapters (optional) adapters/observe.py (Watcher-only file tail), adapters/lab_run.py (--dry-run or full loop on simulator). Not run in CI.
Lab data pipeline (optional) lab/ + tools/ scripts: bootstrap/inject/collect/normalize/replay for external datasets; local only, not in CI.
Harness & metrics harness.py + metrics/recorder.py — orchestrates all scenarios, prints a summary, persists rows to metrics/results.db.

Design rule: agents never duplicate patterns or decision tables inline — skills are the single source of truth for review, diff, and compliance-style audits.


How it works

  1. Watcher scans each log record (optionally tagged with a stream index). If severity is in scope, it walks DETECTABLE_PATTERNS in order and publishes one event on the first substring hit in message.
  2. Healer awaits an event, resolves (action, params) via DECISION_TABLE (or DEFAULT_ACTION), runs the matching function in ACTION_REGISTRY on infra_state, then runs POST_HEAL_VERIFIERS from healer_skills.py on the updated state (unless dry_run or log_unknown). If the predicate fails, success is false even when the action raised no exception. Timing uses asyncio.wait_for per skill timeouts. For shadow / lab, heal_once(..., dry_run=True) skips mutation and skips verification.
  3. Harness resets metrics DB, runs integrations/validate.py (required paths + Hermes policy shape), then drives five experiments: log detection, log full loop, mixed stream (100/10), structured K8s-style signals (k8s_clean_signals.json), and near-real noisy stream (200/20, near_real_stream.json). On failure it exits non-zero (CI uses the same path).
flowchart TB
  subgraph policy [Policy layer]
    WSK[skills/watcher_skills.py]
    HSK[skills/healer_skills.py]
  end
  subgraph runtime [Runtime loop]
    JSON[Generated JSON logs]
    W[Watcher]
    Q[asyncio Queue]
    H[Healer]
    INFRA[infra_state]
    DB[(metrics/results.db)]
  end
  WSK -.-> W
  HSK -.-> H
  JSON --> W
  W --> Q
  Q --> H
  H --> INFRA
  H --> DB
Loading

Detection design (broader coverage, less bias)

  • Case-insensitive matching — Log lines are matched with Unicode casefold, and severities accept any casing (e.g. error / ERROR). That avoids favoring one vendor’s capitalization (Kubernetes vs Docker vs PaaS logs).
  • Vendor-neutral phrasesDETECTABLE_PATTERNS includes multiple paraphrases per class (OOM / cgroup wording, crash-loop and backoff wording, probe and health-check failures, latency and timeout phrasing) so the PoC is not tuned to a single message shape.
  • Diverse synthetic failuresdata/seed.py picks among several templates per failure type for clean and mixed datasets, so experiments are not overfit to four fixed strings.
  • Shared healthy check — The seed script uses the same any_pattern_matches_message() helper as policy in watcher_skills.py, so “no false patterns in healthy logs” is evaluated with the same rules as the Watcher (healthy lines were adjusted so phrases like “response time … within threshold” do not collide with latency rules once matching is case-insensitive).

First matching failure type in DETECTABLE_PATTERNS iteration order wins; patterns are ordered so higher-signal phrases are considered in a stable priority.

Kubernetes-style structured signals (Experiment 4)

This is not a live cluster client: it is the same Watcher → Healer loop fed by JSON that resembles what you would derive from kube-apiserver watches (Pod/Node/Deployment-shaped objects).

Synthetic class Typical real-world analogue Simulated heal
ImagePullBackOff / ErrImagePull Bad image tag, registry auth Roll back to image_previous
SchedulingBlocked FailedScheduling (resources, taints) Clear scheduling_blocked
NodeNotReady Node condition NotReady Set node_ready
ReplicaMismatch Deployment ready ≠ desired sync_replicas
PodDown (Evicted) Pod Failed + evicted / node pressure restore_workload

Why this matters: log substring matching alone is biased toward whatever format your app prints. Production agents usually combine typed API objects + events + metrics. Experiment 4 is a stdlib-only stepping stone: swap signal ingestion for an informer later without changing the Healer contract.

Near-real stream (Experiment 5) & local adapters

Experiment 5 replays near_real_stream.json (from seed.py): 200 records with kube-style timestamps, optional multi-line / stack-ish prefixes, and sometimes JSON-shaped log lines; 20 failures are shuffled among 180 healthy records. It applies the same scoring rules as Experiment 3 (detect / false positives / resolve vs near_real_ground_truth.json). This is still synthetic text — it stress-tests the current substring policy, not your production corpus.

Adapters (under adapters/) are optional tools for local workflows and are not executed in CI:

Script Purpose
adapters/observe.py JSON array file → Watcher only → JSONL detection lines (no Healer).
adapters/lab_run.py Same file → Watcher + Healer on the simulator; use --dry-run to skip ACTION_REGISTRY side effects.

For rollout tiers, charter, and game-day checklist (process only), see docs/GOVERNANCE.md.

External lab replay pipeline (optional)

For higher-fidelity local data without expanding CI scope, this repo includes a minimal pipeline:

  1. Bootstrap lab and deploy a test workload (lab/bootstrap_lab.ps1).
  2. Inject deterministic failures (lab/inject_failures.ps1).
  3. Collect events/logs (tools/collect_k8s_lab_data.py).
  4. Normalize to GHOST replay shape + ground truth (tools/normalize_external_capture.py).
  5. Score with the same Watcher/Healer loop (tools/run_external_replay.pyexperiments/run_experiment_external.py).

One-command wrapper (PowerShell): lab/collect_and_normalize.ps1.

This path is local-only and not wired into harness.py or CI.

Latest published local lab run report: docs/LAB_RUN_REPORT_20260331.md.


Validation & results

Continuous integration: every push and pull request to main runs seed.py and harness.py on Python 3.11 via GitHub Actions (see the CI badge at the top). harness.py first runs integrations/validate.py (stdlib check for contract files and core paths).

Locally, the same commands execute:

Experiment What it proves Expected outcome
1 — Detection Watcher finds all four failure types on clean logs 4 / 4 scenarios PASS
2 — Full loop Healer applies correct mutations after each clean failure (infra reset per scenario) 4 / 4 assertions PASS (memory, port, instances, restart semantics)
3 — Mixed stream 100 lines: 90 healthy + 10 injected failures 10 / 10 detected, 0 false positives on healthy lines, 10 / 10 resolved vs ground truth
4 — K8s signals 6 structured signal records (2× image pull paths + scheduling + node + replicas + evicted pod) 6 / 6 PASS
5 — Near-real noisy stream 200 lines: 180 healthy + 20 injected failures (noisy envelopes) 20 / 20 detected, 0 false positives on healthy lines, 20 / 20 resolved vs ground truth

Timing: On fast local hardware, reported detect/decide/act milliseconds may round to 0 ms; correctness is enforced by assertions, not wall-clock drama. Add delays in the generator or real I/O when you need representative latency distributions.

All runs append structured rows to metrics/results.db for downstream reporting or dashboards. Each successful harness run also appends a JSON summary to feedback_rows (policy versions, per-experiment pass flags, Experiment 3 and 5 counts) via metrics/feedback.py.


Published reports

The repo now includes an executed lab report with concrete artifact paths and replay metrics:

Latest published highlights from that report:

  • Initial external replay on captured lab data: detected 6/11, resolved 6/11, false_positives 0.
  • Follow-up normalization fix (BackOff pull-image mapping) on the same run: detected 11/11, resolved 11/11, false_positives 0.
  • Production meaning: recall bottleneck was in normalization semantics, not in the core Watcher/Healer execution path.

Command reference

Run all commands from repository root.

Goal Command
Generate all synthetic datasets python data/seed.py
Run full CI-equivalent harness python harness.py
Watcher-only on a file stream python adapters/observe.py data/mixed_stream.json
Full loop in dry-run mode python adapters/lab_run.py --dry-run data/near_real_stream.json
Full loop with simulator mutation python adapters/lab_run.py data/mixed_stream.json
Bootstrap local K8s lab ./lab/bootstrap_lab.ps1
Inject deterministic lab failures ./lab/inject_failures.ps1
Collect + normalize + replay lab data ./lab/collect_and_normalize.ps1
Manual external replay python tools/run_external_replay.py --data data/external/runs/<run-id>/normalized.json --ground-truth data/external/runs/<run-id>/ground_truth.json --record
Validate integration contract files only python integrations/validate.py

Use cases

Use case What to run Output / decision value
Validate policy correctness before any infra work python data/seed.py then python harness.py Reproducible pass/fail across five experiments; blocks policy regressions early.
Observe-only triage on captured logs python adapters/observe.py <path-to-json-array> Detection events only; no state mutation; safe for shadow analysis.
Dry-run autonomous response rehearsal python adapters/lab_run.py --dry-run <path-to-json-array> End-to-end detect/decide trace without applying actions.
Evaluate action correctness in simulator python adapters/lab_run.py <path-to-json-array> Simulated state transitions + post-heal verification outcomes.
Reproduce Kubernetes-style incidents locally ./lab/bootstrap_lab.ps1, ./lab/inject_failures.ps1, ./lab/collect_and_normalize.ps1 Captured artifacts + replay score on near-real local signals.
Measure external replay quality over time python tools/run_external_replay.py ... --record Detection/precision/resolution metrics appended for trend tracking.
Author policy updates safely Edit skills/ + run python harness.py + python integrations/validate.py Enforces skills-as-policy boundary and integration contract completeness.
Prepare production rollout process Fill docs/GOVERNANCE.md Defines autonomy tiers, blast radius, and change control before live execution.

Data: synthetic vs real-world samples

Nothing stops you from using real or open-source log data — the project ships synthetic JSON by default for four practical reasons:

Reason Detail
Reproducibility CI and contributors need identical inputs; pinned synthetic output from seed.py guarantees that.
Safety Production logs routinely contain secrets, PII, and internal hostnames — they must not land in a public git history.
Licensing Public “open” log corpora still carry terms (attribution, research-only, no redistribution). Compliance is your obligation when you import them.
Schema & labels GHOST experiments expect structured records and (for scoring) known failure classes. Raw downloads need ETL and often manual or semi-automatic labeling.

“Training” in this PoC does not mean neural-network training. The agents are explicit policies (substring / ordered rules + decision tables). Improving them means engineering: extend skills/watcher_skills.py and skills/k8s_signal_skills.py, validate with harness.py. A future ML layer would be a separate pipeline with its own data governance.

Where to put optional real or redacted samples locally: data/external/README.md — files there stay out of git by default (except that README). See the full operational guide in docs/HELP.md (FAQ: Can we download real scenarios from open-source log providers?).


Production & mission-critical systems

GHOST Phase 1 is a laboratory instrument, not a production controller. The ideas it embodies, however, map directly to how serious teams introduce automation safely.

What transfers well

  • Explicit policy (versioned patterns + action tables) with separation from execution code — supports review, RBAC on changes, and post-incident audit (“what could the robot do?”).
  • Closed-loop tests before prod: the same structure you see in Exp 2–3 should eventually run against staging APIs with frozen golden logs and expected state transitions.
  • Fast, bounded remediation for known classes: restarts within caps, scale-out within limits, cache clears — actions that are reversible and idempotent when designed well.

What production must add

Risk in naive automation Mitigation in mission-critical environments
Log substring false positives Structured signals, alert correlation, rate limits, dry-run / canary, human approval for destructive classes
Blast radius Hard quotas, multi-account isolation, circuit breakers, automatic rollback hooks
Unknown / correlated failures Escalation paths, SLO-based policy, runbook coverage; LLM/heuristics after guardrails and retrieval — not instead of deterministic paths
Governance IAM-bound actions, change windows, immutable audit trail, integration with ticketing and postmortems

Practical tiers (how organizations usually evolve)

  1. Assisted ops — automation gathers context and proposes steps; humans execute risky changes.
  2. Guardrailed autonomy — small set of low-blast, reversible actions with hard caps and shadow mode first.
  3. Expanded policy — broader coverage only where harnesses and game days prove safety.

A fill-in template aligned to these ideas (charter, tier definitions, blast radius, drills) lives in docs/GOVERNANCE.md.

Bottom line: GHOST demonstrates that a deterministic autonomous loop can be built clearly and tested. For mission-critical workloads, the long-term value is shorter MTTR on known paths and less cognitive load on operators — provided automation is constrained, observable, and never the only line of defense.


Research: layered failures & learning

Today’s PoC is intentionally small. The next step toward human-like troubleshooting under incomplete information is to reason across layers (logs, manifests, network, APIs, data) with specialist agents and a coordinator, not a single log grep.

  • docs/VISION_LAYERED_LEARNING.md — layered failure model, partial observability, swarm-style roles (Hermes-like orchestration without claiming a product), topology-aware bias, an honest taxonomy of feedback loops, and how external development tooling (e.g. gstack) fits next to GHOST as policy/code authoring support—not unguarded prod operators.
  • metrics/feedback.py — after each harness.py run, an append-only feedback_rows record is stored in metrics/results.db with pass/fail flags for all five experiments, layer tags (including log_near_real_noisy), and policy (skills) versions so batch jobs can correlate outcomes with policy state (hook for offline policy improvement — not online learning in agents).

Agents here do not perform online gradient descent; “learning” means closing the loop from verified outcomes into policy updates you promote through tests.


Quick start

git clone https://github.com/beejak/GHOST-PoC.git
cd GHOST-PoC
python data/seed.py
python harness.py

Important: Generated JSON under data/ is not committed (see .gitignore). Always run seed.py after a fresh clone before harness.py.

Optional: python data/seed.py --seed 123 — different shuffle of failures inside the mixed stream.

Runtime: Python 3.11+ recommended; 3.9+ may work with the current codebase. Phase 1 requires no pip install.


Quick start by persona

Persona Fastest path Why this path
Operator / SRE evaluator python data/seed.py -> python harness.py Confirms baseline policy correctness before touching any lab tooling.
Policy author (skills editor) Edit skills/ -> python data/seed.py -> python harness.py Ensures every rule change is validated across all five experiments.
Shadow-mode reviewer python adapters/observe.py data/mixed_stream.json Lets you inspect detections without mutation side effects.
Autonomy rehearsal owner python adapters/lab_run.py --dry-run data/near_real_stream.json Exercises full detect/decide flow while staying non-destructive.
Lab pipeline engineer ./lab/bootstrap_lab.ps1 -> ./lab/inject_failures.ps1 -> ./lab/collect_and_normalize.ps1 Produces replayable local K8s-derived data with measurable outcomes.
Governance / risk lead Read docs/GOVERNANCE.md + docs/LAB_RUN_REPORT_20260331.md Maps technical results to rollout tiers, blast radius, and controls.

Help, FAQ & troubleshooting

Question Short answer
Why no real logs in the repo? Reproducibility, CI, licensing, and secret/PII risk — see Data: synthetic vs real above.
Can we download open-source log datasets to “train” the agents? Yes, locally, if the license fits your use case. Today’s agents are rule-based; you refine skills and re-run the harness, not a model trainer. Normalize into the same JSON shape as generated clean_failures.json.
Harness failed on experiment N Re-run python data/seed.py. If it persists, open docs/HELP.mdQuick troubleshooting and match the error pattern. Integration contract validation failed means integrations/validate.py exited non-zero (missing contract file or expected path).
Where is detailed help? docs/HELP.md — troubleshooting table, extended FAQ, extension patterns, support pointers.
How do I change detection or healing? Only via skills/ and simulator/infra_state.py; never duplicate tables inside agents/.

Common fixes

  1. FileNotFoundError on data/*.json → run python data/seed.py from the repository root.
  2. Healthy baseline assertion failed → template overlap with patterns; adjust data/seed.py or skills/watcher_skills.py.
  3. All timings 0 ms → expected on fast CPUs; assertions still prove correctness.

For incident-style walkthroughs, licensing notes, and a path to optional data/external/ workflows, read docs/HELP.md.


References & credits

Attribution for external repositories and upstream projects referenced by this PoC:

Project Link How it is used here
GHOST-PoC (this repository) beejak/GHOST-PoC Primary implementation, experiments, docs, and CI.
gstack garrytan/gstack Referenced for skill-oriented AI development workflows; integrated via compatibility docs and maintainer skill patterns under integrations/gstack/.
Hermes Agent (Nous Research) NousResearch/hermes-agent Referenced as an optional external agent runtime; this repo ships only integration contracts/policies in integrations/hermes/, not Hermes runtime code.

Notes on scope and credit:

  • This repo does not vendor third-party runtime source from gstack or Hermes.
  • Integration is contract-based (policy files, prompts, maintainer guidance), with explicit upstream links for install and licensing.
  • If additional external repos are adopted later, add them here with link, license, and exact usage boundary.

Project structure

GHOST-PoC/
├── docs/
│   ├── HELP.md             # In-depth help, FAQ, real-log guidance, troubleshooting
│   └── GOVERNANCE.md       # Template: tiers, charter, game days (org process; not enforced in code)
├── adapters/               # Optional observe / lab_run (local files → agents)
├── lab/                    # Optional K8s lab scripts/manifests (bootstrap/inject/pipeline wrapper)
├── tools/                  # External data pipeline scripts (collect/normalize/replay)
├── integrations/           # Hermes + gstack-compatible contracts; validate.py (no LLM in CI)
├── skills/                 # Policy: log patterns, K8s signal rules, decision table
├── agents/                 # Watcher, K8s watcher & Healer (import skills only)
├── blackboard/             # Event bus (asyncio queue + validation)
├── simulator/              # Fake infra state + action implementations
├── data/
│   ├── seed.py             # Synthetic dataset generator
│   ├── generator.py        # Async JSON stream for harness
│   ├── scenarios.json      # Scenario metadata
│   └── external/           # Gitignored drops for redacted real samples (README only in git)
├── experiments/            # Experiment 1–5 runners
├── metrics/                # SQLite recorder, reporter, harness feedback ledger
├── harness.py              # Single entrypoint: all experiments
├── Ghost PoC.md.txt        # Full build specification
├── README.md
├── LICENSE
└── requirements.txt        # Phase 2 placeholders only

Documentation index

Document When to read it
README.md (this file) First-time orientation, architecture, validation summary, quick start.
docs/HELP.md World-class operational help: troubleshooting matrix, full FAQ (including real vs synthetic logs), extension guide, support.
docs/VISION_LAYERED_LEARNING.md Research architecture: layered failures, partial info, runtime swarm pattern, feedback roadmap, virtual dev team vs GHOST boundary.
docs/GOVERNANCE.md Rollout template: autonomy tiers, policy change control, blast radius, game days (fill in for your org).
docs/LAB_RUN_REPORT_20260331.md First executed local lab pipeline report (artifacts + replay metrics).
Ghost PoC.md.txt Formal specification, definition of done, build order, synthetic vs real appendix.
data/external/README.md Where optional local / redacted corpora go and what not to commit.
lab/README.md Minimal lab workflow to generate external data and replay it locally.
tools/README.md Collect / normalize / replay scripts for external datasets.
integrations/README.md Hermes (Nous) tool policy + maintainer skill aligned with gstack; validate.py runs inside harness.py.
integrations/hermes/README.md Installing Hermes upstream; mapping TOOL_POLICY.json to your tool config.
integrations/gstack/README.md Vendoring / using the gstack-compatible maintainer skill next to upstream gstack.

License

Licensed under the Apache License 2.0 — see LICENSE.


GHOST · Prove the loop in the lab. Earn the right to run it in production.

If you extend this work, preserve the skills-as-policy pattern — it is the primary maintainability and auditability lesson from Phase 1.

About

Grid Homeostasis & Orchestrated Self-healing Topology

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors