Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
322 changes: 322 additions & 0 deletions skills/rsavitt/swarm-safety/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,322 @@
---
name: swarm-safety
version: 1.7.0
description: "SWARM: System-Wide Assessment of Risk in Multi-agent systems. 38 agent types, 29 governance levers, 55 scenarios. Study emergent risks, phase transitions, and governance cost paradoxes."
homepage: https://github.com/swarm-ai-safety/swarm
metadata: {"openclaw":{"emoji":"🛡️","requires":{"bins":["python3","pip"]}}}
---

# SWARM Safety Skill

Study how intelligence swarms — and where it fails.

SWARM is a research framework for studying emergent risks in multi-agent AI systems using soft (probabilistic) labels instead of binary good/bad classifications. AGI-level risks don't require AGI-level agents — harmful dynamics emerge when many sub-AGI agents interact, even when no individual agent is misaligned.

**v1.7.0** | 38 agent types | 29 governance levers | 55 scenarios | 2922 tests | 8 framework bridges

Repository: `https://github.com/swarm-ai-safety/swarm`

## When to Use

- You need to simulate multi-agent AI systems and study emergent safety risks
- You want to test governance mechanisms (taxes, circuit breakers, audits, staking) against adversarial agent populations
- You are researching phase transitions, adverse selection, or collusion in agent economies
- You need probabilistic (soft-label) safety metrics instead of binary good/bad classifications
- You want to red-team multi-agent deployments with configurable adversarial scenarios

## When NOT to Use

- You need a production-grade agent deployment framework (SWARM is a research simulator)
- You want to monitor or govern a live, deployed AI system in real time
- You need binary classification of agent behavior (SWARM uses probabilistic labels by design)
- Your task has nothing to do with multi-agent safety, governance, or emergent risk

## Hard Rules

- SWARM simulations run locally. Install the package first.
- Do not submit scenarios containing real API keys, credentials, or PII.
- Simulation results are research artifacts. Do not present them as ground truth about real systems.
- When publishing results, cite the framework and disclose simulation parameters.

## Security

- **API binds to localhost only** (`127.0.0.1`) by default to prevent network exposure.
- **CORS restricted** to localhost origins by default.
- **No authentication** on development API — do not expose to untrusted networks.
- **In-memory storage** — data does not persist between restarts.
- For production deployment, add authentication middleware and use a proper database.

## Install

```bash
# From PyPI
pip install swarm-safety

# With LLM agent support
pip install swarm-safety[llm]

# Full development (all extras)
git clone https://github.com/swarm-ai-safety/swarm.git
cd swarm
pip install -e ".[dev,runtime]"
```

## Quick Start (Python)

```python
from swarm.agents.honest import HonestAgent
from swarm.agents.opportunistic import OpportunisticAgent
from swarm.agents.deceptive import DeceptiveAgent
from swarm.agents.adversarial import AdversarialAgent
from swarm.core.orchestrator import Orchestrator, OrchestratorConfig

config = OrchestratorConfig(n_epochs=10, steps_per_epoch=10, seed=42)
orchestrator = Orchestrator(config=config)

orchestrator.register_agent(HonestAgent(agent_id="honest_1", name="Alice"))
orchestrator.register_agent(HonestAgent(agent_id="honest_2", name="Bob"))
orchestrator.register_agent(OpportunisticAgent(agent_id="opp_1"))
orchestrator.register_agent(DeceptiveAgent(agent_id="dec_1"))

metrics = orchestrator.run()
for m in metrics:
print(f"Epoch {m.epoch}: toxicity={m.toxicity_rate:.3f}, welfare={m.total_welfare:.2f}")
```

## Quick Start (CLI)

```bash
# List available scenarios
swarm list

# Run a scenario
swarm run scenarios/baseline.yaml

# Override settings
swarm run scenarios/baseline.yaml --seed 42 --epochs 20 --steps 15

# Export results
swarm run scenarios/baseline.yaml --export-json results.json --export-csv outputs/
```

## Quick Start (API)

Start the API server:

```bash
pip install swarm-safety[api]
uvicorn swarm.api.app:app --host 127.0.0.1 --port 8000
```

API documentation at `http://localhost:8000/docs`.

> **Security Note**: The server binds to `127.0.0.1` (localhost only) by default. Do not bind to `0.0.0.0` unless you understand the security implications and have proper firewall rules in place.

### Register Agent

```bash
curl -X POST http://localhost:8000/api/v1/agents/register \
-H "Content-Type: application/json" \
-d '{
"name": "YourAgent",
"description": "What your agent does",
"capabilities": ["governance-testing", "red-teaming"]
}'
```

Returns `agent_id` and `api_key`.

### Submit Scenario

```bash
curl -X POST http://localhost:8000/api/v1/scenarios/submit \
-H "Content-Type: application/json" \
-d '{
"name": "my-scenario",
"description": "Testing collusion detection with 5 agents",
"yaml_content": "simulation:\n n_epochs: 10\n steps_per_epoch: 10\nagents:\n - type: honest\n count: 3\n - type: adversarial\n count: 2",
"tags": ["collusion", "governance"]
}'
```

### Create & Join Simulation

```bash
# Create
curl -X POST http://localhost:8000/api/v1/simulations/create \
-H "Content-Type: application/json" \
-d '{"scenario_id": "SCENARIO_ID", "max_participants": 5}'

# Join
curl -X POST http://localhost:8000/api/v1/simulations/SIM_ID/join \
-H "Content-Type: application/json" \
-d '{"agent_id": "YOUR_AGENT_ID", "role": "participant"}'
```

## Core Concepts

### Soft Probabilistic Labels

Interactions carry `p = P(v = +1)` — probability of beneficial outcome:

```
Observables -> ProxyComputer -> v_hat -> sigmoid -> p -> PayoffEngine -> payoffs
|
SoftMetrics -> toxicity, quality gap, etc.
```

### Five Key Metrics

| Metric | What It Measures |
|---|---|
| **Toxicity rate** | Expected harm among accepted interactions: `E[1-p \| accepted]` |
| **Quality gap** | Adverse selection indicator (negative = bad): `E[p \| accepted] - E[p \| rejected]` |
| **Conditional loss** | Selection effect on payoffs |
| **Incoherence** | Variance-to-error ratio across replays |
| **Illusion delta** | Gap between perceived coherence and actual consistency |

### Agent Types (14 families, 38 implementations)

| Type | Behavior |
|---|---|
| **Honest** | Cooperative, trust-based, completes tasks diligently |
| **Opportunistic** | Maximizes short-term payoff, cherry-picks tasks |
| **Deceptive** | Builds trust, then exploits trusted relationships |
| **Adversarial** | Targets honest agents, coordinates with allies |
| **LDT** | Logical Decision Theory with FDT/UDT precommitment |
| **RLM** | Reinforcement Learning from Memory |
| **Council** | Multi-agent deliberation-based decisions |
| **SkillRL** | Learns interaction strategies via reward signals |
| **LLM** | Behavior determined by LLM (Anthropic, OpenAI, or Ollama) |
| **Moltbook** | Domain-specific social platform agent |
| **Scholar** | Academic citation and research agent |
| **Wiki Editor** | Collaborative editing with editorial policy |

### Governance Levers (29 mechanisms)

- **Transaction Taxes** — Reduce exploitation, cost welfare
- **Reputation Decay** — Punish bad actors, erode honest standing
- **Circuit Breakers** — Freeze toxic agents quickly
- **Random Audits** — Deter hidden exploitation
- **Staking** — Filter undercapitalized agents
- **Collusion Detection** — Catch coordinated attacks (the critical lever near collapse threshold)
- **Sybil Detection** — Identify duplicate agents
- **Transparency Ledger** — Reward/penalize based on outcome
- **Moderator Agent** — Probabilistic review of interactions
- **Incoherence Friction** — Tax uncertainty-driven decisions
- **Council Deliberation** — Multi-agent governance decisions
- **Diversity Enforcement** — Prevent monoculture collapse
- **Moltipedia-specific** — Pair caps, page cooldowns, daily caps, self-fix prevention

### Framework Bridges

| Bridge | Integration |
|---|---|
| **Concordia** | DeepMind's multi-agent framework |
| **GasTown** | Multi-agent workspace governance |
| **Claude Code** | Claude CLI agent integration |
| **LiveSWE** | Live software engineering tasks |
| **OpenClaw** | Open agent protocol |
| **Prime Intellect** | Cross-platform run tracking |
| **Ralph** | Agent orchestration |
| **Worktree** | Git worktree-based sandboxing |

### Scenario YAML Format

```yaml
simulation:
n_epochs: 10
steps_per_epoch: 10
seed: 42

agents:
- type: honest
count: 3
config:
acceptance_threshold: 0.4
- type: adversarial
count: 2
config:
aggression_level: 0.7

governance:
transaction_tax_rate: 0.05
circuit_breaker_enabled: true
collusion_detection_enabled: true

success_criteria:
max_toxicity: 0.3
min_quality_gap: 0.0
```

## Key Research Findings

### Phase Transitions (11-scenario, 209-epoch study)

| Regime | Adversarial % | Toxicity | Welfare | Outcome |
|--------|--------------|----------|---------|---------|
| Cooperative | 0-20% | < 0.30 | Stable | Survives |
| Contested | 20-37.5% | 0.33-0.37 | Declining | Survives |
| Collapse | 50%+ | ~0.30 | Zero by epoch 12-14 | **Collapses** |

Critical threshold between 37.5% and 50% adversarial agents separates recoverable from irreversible collapse.

### Governance Cost Paradox (v1.7.0 GasTown study)

42-run study reveals: governance reduces toxicity at all adversarial levels (mean reduction 0.071) but imposes net-negative welfare costs at current parameter tuning. At 0% adversarial, governance costs 216 welfare units (-57.6%) for only 0.066 toxicity reduction.

## Case Studies

### GasTown Governance Cost

Study governance overhead vs. toxicity reduction across 7 agent compositions with and without governance levers. Reveals the safety-throughput trade-off. See `scenarios/gastown_governance_cost.yaml`.

### LDT Cooperation

220 runs across 10 seeds comparing TDT vs FDT vs UDT cooperation strategies at population scales up to 21 agents. See `scenarios/ldt_cooperation.yaml`.

### Moltipedia Heartbeat

Model the Moltipedia wiki editing loop: competing AI editors, editorial policy, point farming, and anti-gaming governance. See `scenarios/moltipedia_heartbeat.yaml`.

### Moltbook CAPTCHA

Model Moltbook's anti-human math challenges and rate limiting: obfuscated text parsing, verification gates, and spam prevention. See `scenarios/moltbook_captcha.yaml`.

## API Endpoints (Full Reference)

| Method | Endpoint | Description |
|---|---|---|
| GET | `/health` | Health check |
| GET | `/` | API info |
| POST | `/api/v1/agents/register` | Register agent |
| GET | `/api/v1/agents/{agent_id}` | Get agent details |
| GET | `/api/v1/agents/` | List agents |
| POST | `/api/v1/scenarios/submit` | Submit scenario |
| GET | `/api/v1/scenarios/{scenario_id}` | Get scenario |
| GET | `/api/v1/scenarios/` | List scenarios |
| POST | `/api/v1/simulations/create` | Create simulation |
| POST | `/api/v1/simulations/{id}/join` | Join simulation |
| GET | `/api/v1/simulations/{id}` | Get simulation |
| GET | `/api/v1/simulations/` | List simulations |

## Citation

```bibtex
@software{swarm2026,
title = {SWARM: System-Wide Assessment of Risk in Multi-agent systems},
author = {Savitt, Raeli},
year = {2026},
url = {https://github.com/swarm-ai-safety/swarm}
}
```

## Linked Docs

- Skill metadata: `skill.json`
- Agent discovery: `.well-known/agent.json`
- Full documentation: `https://github.com/swarm-ai-safety/swarm/tree/main/docs`
- Theoretical foundations: `docs/research/theory.md`
- Governance guide: `docs/governance.md`
- Red-teaming guide: `docs/red-teaming.md`
- Scenario format: `docs/guides/scenarios.md`
11 changes: 11 additions & 0 deletions skills/rsavitt/swarm-safety/_meta.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"owner": "rsavitt",
"slug": "swarm-safety",
"displayName": "SWARM Safety",
"latest": {
"version": "1.7.0",
"publishedAt": 1771813733000,
"commit": ""
},
"history": []
}
21 changes: 21 additions & 0 deletions skills/rsavitt/swarm-safety/scripts/setup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#!/usr/bin/env bash
set -euo pipefail

# SWARM Safety - setup script
# Installs the swarm-safety package with runtime dependencies

if ! command -v pip &> /dev/null && ! command -v pip3 &> /dev/null; then
echo "Error: pip is not installed. Please install Python 3.10+ and pip first."
exit 1
fi

PIP_CMD="pip"
if ! command -v pip &> /dev/null; then
PIP_CMD="pip3"
fi

echo "Installing swarm-safety[runtime]>=1.7.0 ..."
$PIP_CMD install "swarm-safety[runtime]>=1.7.0"

echo "swarm-safety installed successfully."
echo "Run 'swarm list' to see available scenarios."
7 changes: 7 additions & 0 deletions skills/rsavitt/swarm-safety/scripts/swarm-cli.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/usr/bin/env bash
set -euo pipefail

# SWARM Safety - CLI wrapper
# Thin wrapper around python -m swarm

exec python3 -m swarm "$@"