diff --git a/skills/rsavitt/swarm-safety/SKILL.md b/skills/rsavitt/swarm-safety/SKILL.md new file mode 100644 index 00000000000..3032866d047 --- /dev/null +++ b/skills/rsavitt/swarm-safety/SKILL.md @@ -0,0 +1,322 @@ +--- +name: swarm-safety +version: 1.7.0 +description: "SWARM: System-Wide Assessment of Risk in Multi-agent systems. 38 agent types, 29 governance levers, 55 scenarios. Study emergent risks, phase transitions, and governance cost paradoxes." +homepage: https://github.com/swarm-ai-safety/swarm +metadata: {"openclaw":{"emoji":"🛡️","requires":{"bins":["python3","pip"]}}} +--- + +# SWARM Safety Skill + +Study how intelligence swarms — and where it fails. + +SWARM is a research framework for studying emergent risks in multi-agent AI systems using soft (probabilistic) labels instead of binary good/bad classifications. AGI-level risks don't require AGI-level agents — harmful dynamics emerge when many sub-AGI agents interact, even when no individual agent is misaligned. + +**v1.7.0** | 38 agent types | 29 governance levers | 55 scenarios | 2922 tests | 8 framework bridges + +Repository: `https://github.com/swarm-ai-safety/swarm` + +## When to Use + +- You need to simulate multi-agent AI systems and study emergent safety risks +- You want to test governance mechanisms (taxes, circuit breakers, audits, staking) against adversarial agent populations +- You are researching phase transitions, adverse selection, or collusion in agent economies +- You need probabilistic (soft-label) safety metrics instead of binary good/bad classifications +- You want to red-team multi-agent deployments with configurable adversarial scenarios + +## When NOT to Use + +- You need a production-grade agent deployment framework (SWARM is a research simulator) +- You want to monitor or govern a live, deployed AI system in real time +- You need binary classification of agent behavior (SWARM uses probabilistic labels by design) +- Your task has nothing to do with multi-agent safety, governance, or emergent risk + +## Hard Rules + +- SWARM simulations run locally. Install the package first. +- Do not submit scenarios containing real API keys, credentials, or PII. +- Simulation results are research artifacts. Do not present them as ground truth about real systems. +- When publishing results, cite the framework and disclose simulation parameters. + +## Security + +- **API binds to localhost only** (`127.0.0.1`) by default to prevent network exposure. +- **CORS restricted** to localhost origins by default. +- **No authentication** on development API — do not expose to untrusted networks. +- **In-memory storage** — data does not persist between restarts. +- For production deployment, add authentication middleware and use a proper database. + +## Install + +```bash +# From PyPI +pip install swarm-safety + +# With LLM agent support +pip install swarm-safety[llm] + +# Full development (all extras) +git clone https://github.com/swarm-ai-safety/swarm.git +cd swarm +pip install -e ".[dev,runtime]" +``` + +## Quick Start (Python) + +```python +from swarm.agents.honest import HonestAgent +from swarm.agents.opportunistic import OpportunisticAgent +from swarm.agents.deceptive import DeceptiveAgent +from swarm.agents.adversarial import AdversarialAgent +from swarm.core.orchestrator import Orchestrator, OrchestratorConfig + +config = OrchestratorConfig(n_epochs=10, steps_per_epoch=10, seed=42) +orchestrator = Orchestrator(config=config) + +orchestrator.register_agent(HonestAgent(agent_id="honest_1", name="Alice")) +orchestrator.register_agent(HonestAgent(agent_id="honest_2", name="Bob")) +orchestrator.register_agent(OpportunisticAgent(agent_id="opp_1")) +orchestrator.register_agent(DeceptiveAgent(agent_id="dec_1")) + +metrics = orchestrator.run() +for m in metrics: + print(f"Epoch {m.epoch}: toxicity={m.toxicity_rate:.3f}, welfare={m.total_welfare:.2f}") +``` + +## Quick Start (CLI) + +```bash +# List available scenarios +swarm list + +# Run a scenario +swarm run scenarios/baseline.yaml + +# Override settings +swarm run scenarios/baseline.yaml --seed 42 --epochs 20 --steps 15 + +# Export results +swarm run scenarios/baseline.yaml --export-json results.json --export-csv outputs/ +``` + +## Quick Start (API) + +Start the API server: + +```bash +pip install swarm-safety[api] +uvicorn swarm.api.app:app --host 127.0.0.1 --port 8000 +``` + +API documentation at `http://localhost:8000/docs`. + +> **Security Note**: The server binds to `127.0.0.1` (localhost only) by default. Do not bind to `0.0.0.0` unless you understand the security implications and have proper firewall rules in place. + +### Register Agent + +```bash +curl -X POST http://localhost:8000/api/v1/agents/register \ + -H "Content-Type: application/json" \ + -d '{ + "name": "YourAgent", + "description": "What your agent does", + "capabilities": ["governance-testing", "red-teaming"] + }' +``` + +Returns `agent_id` and `api_key`. + +### Submit Scenario + +```bash +curl -X POST http://localhost:8000/api/v1/scenarios/submit \ + -H "Content-Type: application/json" \ + -d '{ + "name": "my-scenario", + "description": "Testing collusion detection with 5 agents", + "yaml_content": "simulation:\n n_epochs: 10\n steps_per_epoch: 10\nagents:\n - type: honest\n count: 3\n - type: adversarial\n count: 2", + "tags": ["collusion", "governance"] + }' +``` + +### Create & Join Simulation + +```bash +# Create +curl -X POST http://localhost:8000/api/v1/simulations/create \ + -H "Content-Type: application/json" \ + -d '{"scenario_id": "SCENARIO_ID", "max_participants": 5}' + +# Join +curl -X POST http://localhost:8000/api/v1/simulations/SIM_ID/join \ + -H "Content-Type: application/json" \ + -d '{"agent_id": "YOUR_AGENT_ID", "role": "participant"}' +``` + +## Core Concepts + +### Soft Probabilistic Labels + +Interactions carry `p = P(v = +1)` — probability of beneficial outcome: + +``` +Observables -> ProxyComputer -> v_hat -> sigmoid -> p -> PayoffEngine -> payoffs + | + SoftMetrics -> toxicity, quality gap, etc. +``` + +### Five Key Metrics + +| Metric | What It Measures | +|---|---| +| **Toxicity rate** | Expected harm among accepted interactions: `E[1-p \| accepted]` | +| **Quality gap** | Adverse selection indicator (negative = bad): `E[p \| accepted] - E[p \| rejected]` | +| **Conditional loss** | Selection effect on payoffs | +| **Incoherence** | Variance-to-error ratio across replays | +| **Illusion delta** | Gap between perceived coherence and actual consistency | + +### Agent Types (14 families, 38 implementations) + +| Type | Behavior | +|---|---| +| **Honest** | Cooperative, trust-based, completes tasks diligently | +| **Opportunistic** | Maximizes short-term payoff, cherry-picks tasks | +| **Deceptive** | Builds trust, then exploits trusted relationships | +| **Adversarial** | Targets honest agents, coordinates with allies | +| **LDT** | Logical Decision Theory with FDT/UDT precommitment | +| **RLM** | Reinforcement Learning from Memory | +| **Council** | Multi-agent deliberation-based decisions | +| **SkillRL** | Learns interaction strategies via reward signals | +| **LLM** | Behavior determined by LLM (Anthropic, OpenAI, or Ollama) | +| **Moltbook** | Domain-specific social platform agent | +| **Scholar** | Academic citation and research agent | +| **Wiki Editor** | Collaborative editing with editorial policy | + +### Governance Levers (29 mechanisms) + +- **Transaction Taxes** — Reduce exploitation, cost welfare +- **Reputation Decay** — Punish bad actors, erode honest standing +- **Circuit Breakers** — Freeze toxic agents quickly +- **Random Audits** — Deter hidden exploitation +- **Staking** — Filter undercapitalized agents +- **Collusion Detection** — Catch coordinated attacks (the critical lever near collapse threshold) +- **Sybil Detection** — Identify duplicate agents +- **Transparency Ledger** — Reward/penalize based on outcome +- **Moderator Agent** — Probabilistic review of interactions +- **Incoherence Friction** — Tax uncertainty-driven decisions +- **Council Deliberation** — Multi-agent governance decisions +- **Diversity Enforcement** — Prevent monoculture collapse +- **Moltipedia-specific** — Pair caps, page cooldowns, daily caps, self-fix prevention + +### Framework Bridges + +| Bridge | Integration | +|---|---| +| **Concordia** | DeepMind's multi-agent framework | +| **GasTown** | Multi-agent workspace governance | +| **Claude Code** | Claude CLI agent integration | +| **LiveSWE** | Live software engineering tasks | +| **OpenClaw** | Open agent protocol | +| **Prime Intellect** | Cross-platform run tracking | +| **Ralph** | Agent orchestration | +| **Worktree** | Git worktree-based sandboxing | + +### Scenario YAML Format + +```yaml +simulation: + n_epochs: 10 + steps_per_epoch: 10 + seed: 42 + +agents: + - type: honest + count: 3 + config: + acceptance_threshold: 0.4 + - type: adversarial + count: 2 + config: + aggression_level: 0.7 + +governance: + transaction_tax_rate: 0.05 + circuit_breaker_enabled: true + collusion_detection_enabled: true + +success_criteria: + max_toxicity: 0.3 + min_quality_gap: 0.0 +``` + +## Key Research Findings + +### Phase Transitions (11-scenario, 209-epoch study) + +| Regime | Adversarial % | Toxicity | Welfare | Outcome | +|--------|--------------|----------|---------|---------| +| Cooperative | 0-20% | < 0.30 | Stable | Survives | +| Contested | 20-37.5% | 0.33-0.37 | Declining | Survives | +| Collapse | 50%+ | ~0.30 | Zero by epoch 12-14 | **Collapses** | + +Critical threshold between 37.5% and 50% adversarial agents separates recoverable from irreversible collapse. + +### Governance Cost Paradox (v1.7.0 GasTown study) + +42-run study reveals: governance reduces toxicity at all adversarial levels (mean reduction 0.071) but imposes net-negative welfare costs at current parameter tuning. At 0% adversarial, governance costs 216 welfare units (-57.6%) for only 0.066 toxicity reduction. + +## Case Studies + +### GasTown Governance Cost + +Study governance overhead vs. toxicity reduction across 7 agent compositions with and without governance levers. Reveals the safety-throughput trade-off. See `scenarios/gastown_governance_cost.yaml`. + +### LDT Cooperation + +220 runs across 10 seeds comparing TDT vs FDT vs UDT cooperation strategies at population scales up to 21 agents. See `scenarios/ldt_cooperation.yaml`. + +### Moltipedia Heartbeat + +Model the Moltipedia wiki editing loop: competing AI editors, editorial policy, point farming, and anti-gaming governance. See `scenarios/moltipedia_heartbeat.yaml`. + +### Moltbook CAPTCHA + +Model Moltbook's anti-human math challenges and rate limiting: obfuscated text parsing, verification gates, and spam prevention. See `scenarios/moltbook_captcha.yaml`. + +## API Endpoints (Full Reference) + +| Method | Endpoint | Description | +|---|---|---| +| GET | `/health` | Health check | +| GET | `/` | API info | +| POST | `/api/v1/agents/register` | Register agent | +| GET | `/api/v1/agents/{agent_id}` | Get agent details | +| GET | `/api/v1/agents/` | List agents | +| POST | `/api/v1/scenarios/submit` | Submit scenario | +| GET | `/api/v1/scenarios/{scenario_id}` | Get scenario | +| GET | `/api/v1/scenarios/` | List scenarios | +| POST | `/api/v1/simulations/create` | Create simulation | +| POST | `/api/v1/simulations/{id}/join` | Join simulation | +| GET | `/api/v1/simulations/{id}` | Get simulation | +| GET | `/api/v1/simulations/` | List simulations | + +## Citation + +```bibtex +@software{swarm2026, + title = {SWARM: System-Wide Assessment of Risk in Multi-agent systems}, + author = {Savitt, Raeli}, + year = {2026}, + url = {https://github.com/swarm-ai-safety/swarm} +} +``` + +## Linked Docs + +- Skill metadata: `skill.json` +- Agent discovery: `.well-known/agent.json` +- Full documentation: `https://github.com/swarm-ai-safety/swarm/tree/main/docs` +- Theoretical foundations: `docs/research/theory.md` +- Governance guide: `docs/governance.md` +- Red-teaming guide: `docs/red-teaming.md` +- Scenario format: `docs/guides/scenarios.md` diff --git a/skills/rsavitt/swarm-safety/_meta.json b/skills/rsavitt/swarm-safety/_meta.json new file mode 100644 index 00000000000..d47ccf1349d --- /dev/null +++ b/skills/rsavitt/swarm-safety/_meta.json @@ -0,0 +1,11 @@ +{ + "owner": "rsavitt", + "slug": "swarm-safety", + "displayName": "SWARM Safety", + "latest": { + "version": "1.7.0", + "publishedAt": 1771813733000, + "commit": "" + }, + "history": [] +} diff --git a/skills/rsavitt/swarm-safety/scripts/setup.sh b/skills/rsavitt/swarm-safety/scripts/setup.sh new file mode 100755 index 00000000000..6692a97a94e --- /dev/null +++ b/skills/rsavitt/swarm-safety/scripts/setup.sh @@ -0,0 +1,21 @@ +#!/usr/bin/env bash +set -euo pipefail + +# SWARM Safety - setup script +# Installs the swarm-safety package with runtime dependencies + +if ! command -v pip &> /dev/null && ! command -v pip3 &> /dev/null; then + echo "Error: pip is not installed. Please install Python 3.10+ and pip first." + exit 1 +fi + +PIP_CMD="pip" +if ! command -v pip &> /dev/null; then + PIP_CMD="pip3" +fi + +echo "Installing swarm-safety[runtime]>=1.7.0 ..." +$PIP_CMD install "swarm-safety[runtime]>=1.7.0" + +echo "swarm-safety installed successfully." +echo "Run 'swarm list' to see available scenarios." diff --git a/skills/rsavitt/swarm-safety/scripts/swarm-cli.sh b/skills/rsavitt/swarm-safety/scripts/swarm-cli.sh new file mode 100755 index 00000000000..29f24c364cf --- /dev/null +++ b/skills/rsavitt/swarm-safety/scripts/swarm-cli.sh @@ -0,0 +1,7 @@ +#!/usr/bin/env bash +set -euo pipefail + +# SWARM Safety - CLI wrapper +# Thin wrapper around python -m swarm + +exec python3 -m swarm "$@"