openclaw · rsavitt · Feb 23, 2026
diff --git a/skills/rsavitt/swarm-safety/SKILL.md b/skills/rsavitt/swarm-safety/SKILL.md
@@ -0,0 +1,322 @@
+---
+name: swarm-safety
+version: 1.7.0
+description: "SWARM: System-Wide Assessment of Risk in Multi-agent systems. 38 agent types, 29 governance levers, 55 scenarios. Study emergent risks, phase transitions, and governance cost paradoxes."
+homepage: https://github.com/swarm-ai-safety/swarm
+metadata: {"openclaw":{"emoji":"🛡️","requires":{"bins":["python3","pip"]}}}
+---
+
+# SWARM Safety Skill
+
+Study how intelligence swarms — and where it fails.
+
+SWARM is a research framework for studying emergent risks in multi-agent AI systems using soft (probabilistic) labels instead of binary good/bad classifications. AGI-level risks don't require AGI-level agents — harmful dynamics emerge when many sub-AGI agents interact, even when no individual agent is misaligned.
+
+**v1.7.0** | 38 agent types | 29 governance levers | 55 scenarios | 2922 tests | 8 framework bridges
+
+Repository: `https://github.com/swarm-ai-safety/swarm`
+
+## When to Use
+
+- You need to simulate multi-agent AI systems and study emergent safety risks
+- You want to test governance mechanisms (taxes, circuit breakers, audits, staking) against adversarial agent populations
+- You are researching phase transitions, adverse selection, or collusion in agent economies
+- You need probabilistic (soft-label) safety metrics instead of binary good/bad classifications
+- You want to red-team multi-agent deployments with configurable adversarial scenarios
+
+## When NOT to Use
+
+- You need a production-grade agent deployment framework (SWARM is a research simulator)
+- You want to monitor or govern a live, deployed AI system in real time
+- You need binary classification of agent behavior (SWARM uses probabilistic labels by design)
+- Your task has nothing to do with multi-agent safety, governance, or emergent risk
+
+## Hard Rules
+
+- SWARM simulations run locally. Install the package first.
+- Do not submit scenarios containing real API keys, credentials, or PII.
+- Simulation results are research artifacts. Do not present them as ground truth about real systems.
+- When publishing results, cite the framework and disclose simulation parameters.
+
+## Security
+
+- **API binds to localhost only** (`127.0.0.1`) by default to prevent network exposure.
+- **CORS restricted** to localhost origins by default.
+- **No authentication** on development API — do not expose to untrusted networks.
+- **In-memory storage** — data does not persist between restarts.
+- For production deployment, add authentication middleware and use a proper database.
+
+## Install
+
+```bash
+# From PyPI
+pip install swarm-safety
+
+# With LLM agent support
+pip install swarm-safety[llm]
+
+# Full development (all extras)
+git clone https://github.com/swarm-ai-safety/swarm.git
+cd swarm
+pip install -e ".[dev,runtime]"
+```
+
+## Quick Start (Python)
+
+```python
+from swarm.agents.honest import HonestAgent
+from swarm.agents.opportunistic import OpportunisticAgent
+from swarm.agents.deceptive import DeceptiveAgent
+from swarm.agents.adversarial import AdversarialAgent
+from swarm.core.orchestrator import Orchestrator, OrchestratorConfig
+
+config = OrchestratorConfig(n_epochs=10, steps_per_epoch=10, seed=42)
+orchestrator = Orchestrator(config=config)
+
+orchestrator.register_agent(HonestAgent(agent_id="honest_1", name="Alice"))
+orchestrator.register_agent(HonestAgent(agent_id="honest_2", name="Bob"))
+orchestrator.register_agent(OpportunisticAgent(agent_id="opp_1"))
+orchestrator.register_agent(DeceptiveAgent(agent_id="dec_1"))
+
+metrics = orchestrator.run()
+for m in metrics:
+    print(f"Epoch {m.epoch}: toxicity={m.toxicity_rate:.3f}, welfare={m.total_welfare:.2f}")
+```
+
+## Quick Start (CLI)
+
+```bash
+# List available scenarios
+swarm list
+
+# Run a scenario
+swarm run scenarios/baseline.yaml
+
+# Override settings
+swarm run scenarios/baseline.yaml --seed 42 --epochs 20 --steps 15
+
+# Export results
+swarm run scenarios/baseline.yaml --export-json results.json --export-csv outputs/
+```
+
+## Quick Start (API)
+
+Start the API server:
+
+```bash
+pip install swarm-safety[api]
+uvicorn swarm.api.app:app --host 127.0.0.1 --port 8000
+```
+
+API documentation at `http://localhost:8000/docs`.
+
+> **Security Note**: The server binds to `127.0.0.1` (localhost only) by default. Do not bind to `0.0.0.0` unless you understand the security implications and have proper firewall rules in place.
+
+### Register Agent
+
+```bash
+curl -X POST http://localhost:8000/api/v1/agents/register \
+  -H "Content-Type: application/json" \
+  -d '{
+    "name": "YourAgent",
+    "description": "What your agent does",
+    "capabilities": ["governance-testing", "red-teaming"]
+  }'
+```
+
+Returns `agent_id` and `api_key`.
+
+### Submit Scenario
+
+```bash
+curl -X POST http://localhost:8000/api/v1/scenarios/submit \
+  -H "Content-Type: application/json" \
+  -d '{
+    "name": "my-scenario",
+    "description": "Testing collusion detection with 5 agents",
+    "yaml_content": "simulation:\n  n_epochs: 10\n  steps_per_epoch: 10\nagents:\n  - type: honest\n    count: 3\n  - type: adversarial\n    count: 2",
+    "tags": ["collusion", "governance"]
+  }'
+```
+
+### Create & Join Simulation
+
+```bash
+# Create
+curl -X POST http://localhost:8000/api/v1/simulations/create \
+  -H "Content-Type: application/json" \
+  -d '{"scenario_id": "SCENARIO_ID", "max_participants": 5}'
+
+# Join
+curl -X POST http://localhost:8000/api/v1/simulations/SIM_ID/join \
+  -H "Content-Type: application/json" \
+  -d '{"agent_id": "YOUR_AGENT_ID", "role": "participant"}'
+```
+
+## Core Concepts
+
+### Soft Probabilistic Labels
+
+Interactions carry `p = P(v = +1)` — probability of beneficial outcome:
+
+```
+Observables -> ProxyComputer -> v_hat -> sigmoid -> p -> PayoffEngine -> payoffs
+                                                    |
+                                               SoftMetrics -> toxicity, quality gap, etc.
+```
+
+### Five Key Metrics
+
+| Metric | What It Measures |
+|---|---|
+| **Toxicity rate** | Expected harm among accepted interactions: `E[1-p \| accepted]` |
+| **Quality gap** | Adverse selection indicator (negative = bad): `E[p \| accepted] - E[p \| rejected]` |
+| **Conditional loss** | Selection effect on payoffs |
+| **Incoherence** | Variance-to-error ratio across replays |
+| **Illusion delta** | Gap between perceived coherence and actual consistency |
+
+### Agent Types (14 families, 38 implementations)
+
+| Type | Behavior |
+|---|---|
+| **Honest** | Cooperative, trust-based, completes tasks diligently |
+| **Opportunistic** | Maximizes short-term payoff, cherry-picks tasks |
+| **Deceptive** | Builds trust, then exploits trusted relationships |
+| **Adversarial** | Targets honest agents, coordinates with allies |
+| **LDT** | Logical Decision Theory with FDT/UDT precommitment |
+| **RLM** | Reinforcement Learning from Memory |
+| **Council** | Multi-agent deliberation-based decisions |
+| **SkillRL** | Learns interaction strategies via reward signals |
+| **LLM** | Behavior determined by LLM (Anthropic, OpenAI, or Ollama) |
+| **Moltbook** | Domain-specific social platform agent |
+| **Scholar** | Academic citation and research agent |
+| **Wiki Editor** | Collaborative editing with editorial policy |
+
+### Governance Levers (29 mechanisms)
+
+- **Transaction Taxes** — Reduce exploitation, cost welfare
+- **Reputation Decay** — Punish bad actors, erode honest standing
+- **Circuit Breakers** — Freeze toxic agents quickly
+- **Random Audits** — Deter hidden exploitation
+- **Staking** — Filter undercapitalized agents
+- **Collusion Detection** — Catch coordinated attacks (the critical lever near collapse threshold)
+- **Sybil Detection** — Identify duplicate agents
+- **Transparency Ledger** — Reward/penalize based on outcome
+- **Moderator Agent** — Probabilistic review of interactions
+- **Incoherence Friction** — Tax uncertainty-driven decisions
+- **Council Deliberation** — Multi-agent governance decisions
+- **Diversity Enforcement** — Prevent monoculture collapse
+- **Moltipedia-specific** — Pair caps, page cooldowns, daily caps, self-fix prevention
+
+### Framework Bridges
+
+| Bridge | Integration |
+|---|---|
+| **Concordia** | DeepMind's multi-agent framework |
+| **GasTown** | Multi-agent workspace governance |
+| **Claude Code** | Claude CLI agent integration |
+| **LiveSWE** | Live software engineering tasks |
+| **OpenClaw** | Open agent protocol |
+| **Prime Intellect** | Cross-platform run tracking |
+| **Ralph** | Agent orchestration |
+| **Worktree** | Git worktree-based sandboxing |
+
+### Scenario YAML Format
+
+```yaml
+simulation:
+  n_epochs: 10
+  steps_per_epoch: 10
+  seed: 42
+
+agents:
+  - type: honest
+    count: 3
+    config:
+      acceptance_threshold: 0.4
+  - type: adversarial
+    count: 2
+    config:
+      aggression_level: 0.7
+
+governance:
+  transaction_tax_rate: 0.05
+  circuit_breaker_enabled: true
+  collusion_detection_enabled: true
+
+success_criteria:
+  max_toxicity: 0.3
+  min_quality_gap: 0.0
+```
+
+## Key Research Findings
+
+### Phase Transitions (11-scenario, 209-epoch study)
+
+| Regime | Adversarial % | Toxicity | Welfare | Outcome |
+|--------|--------------|----------|---------|---------|
+| Cooperative | 0-20% | < 0.30 | Stable | Survives |
+| Contested | 20-37.5% | 0.33-0.37 | Declining | Survives |
+| Collapse | 50%+ | ~0.30 | Zero by epoch 12-14 | **Collapses** |
+
+Critical threshold between 37.5% and 50% adversarial agents separates recoverable from irreversible collapse.
+
+### Governance Cost Paradox (v1.7.0 GasTown study)
+
+42-run study reveals: governance reduces toxicity at all adversarial levels (mean reduction 0.071) but imposes net-negative welfare costs at current parameter tuning. At 0% adversarial, governance costs 216 welfare units (-57.6%) for only 0.066 toxicity reduction.
+
+## Case Studies
+
+### GasTown Governance Cost
+
+Study governance overhead vs. toxicity reduction across 7 agent compositions with and without governance levers. Reveals the safety-throughput trade-off. See `scenarios/gastown_governance_cost.yaml`.
+
+### LDT Cooperation
+
+220 runs across 10 seeds comparing TDT vs FDT vs UDT cooperation strategies at population scales up to 21 agents. See `scenarios/ldt_cooperation.yaml`.
+
+### Moltipedia Heartbeat
+
+Model the Moltipedia wiki editing loop: competing AI editors, editorial policy, point farming, and anti-gaming governance. See `scenarios/moltipedia_heartbeat.yaml`.
+
+### Moltbook CAPTCHA
+
+Model Moltbook's anti-human math challenges and rate limiting: obfuscated text parsing, verification gates, and spam prevention. See `scenarios/moltbook_captcha.yaml`.
+
+## API Endpoints (Full Reference)
+
+| Method | Endpoint | Description |
+|---|---|---|
+| GET | `/health` | Health check |
+| GET | `/` | API info |
+| POST | `/api/v1/agents/register` | Register agent |
+| GET | `/api/v1/agents/{agent_id}` | Get agent details |
+| GET | `/api/v1/agents/` | List agents |
+| POST | `/api/v1/scenarios/submit` | Submit scenario |
+| GET | `/api/v1/scenarios/{scenario_id}` | Get scenario |
+| GET | `/api/v1/scenarios/` | List scenarios |
+| POST | `/api/v1/simulations/create` | Create simulation |
+| POST | `/api/v1/simulations/{id}/join` | Join simulation |
+| GET | `/api/v1/simulations/{id}` | Get simulation |
+| GET | `/api/v1/simulations/` | List simulations |
+
+## Citation
+
+```bibtex
+@software{swarm2026,
+  title = {SWARM: System-Wide Assessment of Risk in Multi-agent systems},
+  author = {Savitt, Raeli},
+  year = {2026},
+  url = {https://github.com/swarm-ai-safety/swarm}
+}
+```
+
+## Linked Docs
+
+- Skill metadata: `skill.json`
+- Agent discovery: `.well-known/agent.json`
+- Full documentation: `https://github.com/swarm-ai-safety/swarm/tree/main/docs`
+- Theoretical foundations: `docs/research/theory.md`
+- Governance guide: `docs/governance.md`
+- Red-teaming guide: `docs/red-teaming.md`
+- Scenario format: `docs/guides/scenarios.md`
diff --git a/skills/rsavitt/swarm-safety/_meta.json b/skills/rsavitt/swarm-safety/_meta.json
@@ -0,0 +1,11 @@
+{
+  "owner": "rsavitt",
+  "slug": "swarm-safety",
+  "displayName": "SWARM Safety",
+  "latest": {
+    "version": "1.7.0",
+    "publishedAt": 1771813733000,
+    "commit": ""
+  },
+  "history": []
+}
diff --git a/skills/rsavitt/swarm-safety/scripts/setup.sh b/skills/rsavitt/swarm-safety/scripts/setup.sh
@@ -0,0 +1,21 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+# SWARM Safety - setup script
+# Installs the swarm-safety package with runtime dependencies
+
+if ! command -v pip &> /dev/null && ! command -v pip3 &> /dev/null; then
+  echo "Error: pip is not installed. Please install Python 3.10+ and pip first."
+  exit 1
+fi
+
+PIP_CMD="pip"
+if ! command -v pip &> /dev/null; then
+  PIP_CMD="pip3"
+fi
+
+echo "Installing swarm-safety[runtime]>=1.7.0 ..."
+$PIP_CMD install "swarm-safety[runtime]>=1.7.0"
+
+echo "swarm-safety installed successfully."
+echo "Run 'swarm list' to see available scenarios."
diff --git a/skills/rsavitt/swarm-safety/scripts/swarm-cli.sh b/skills/rsavitt/swarm-safety/scripts/swarm-cli.sh
@@ -0,0 +1,7 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+# SWARM Safety - CLI wrapper
+# Thin wrapper around python -m swarm
+
+exec python3 -m swarm "$@"