AgentForge is a Foundry-native framework for adversarial, agent-based simulation of EVM mechanisms over time.
Note: AgentForge is currently in alpha. APIs may change and you may encounter bugs.
| Layer | Tests | Example |
|---|---|---|
| Unit tests | Individual functions | test_transfer() |
| Fuzz tests | Random inputs | testFuzz_transfer(uint256 amount) |
| AgentForge | Multi-actor emergent behavior | Traders, arbitrageurs, liquidators competing |
| Mainnet | Real users | Production |
AgentForge fills the gap between isolated tests and production by simulating how your protocol behaves when many autonomous agents act simultaneously with different strategies over time.
Each simulation run produces durable artifacts:
results/<scenario>-<timestamp>/
├── summary.json # Run metadata, final metrics, assertion results
├── metrics.csv # Time-series data for analysis
├── actions.ndjson # Complete action log
├── config_resolved.json # Resolved configuration for reproducibility
├── replay_bundle.json # Exploration trace bundle for replay mode
├── smoke_results.json # Assumption perturbation divergence outputs (optional)
└── report.md # Generated report (optional)
Plus reporting commands:
forge-sim report <runDir>— Generate a Markdown reportforge-sim dashboard <runDir>— Build a static dashboard folder for a runforge-sim studio— Launch multi-run Studio (sessions, paging, analytics)forge-sim serve <runDir>— Serve a generated dashboard over HTTPforge-sim compare <runA> <runB>— Diff two runsforge-sim sweep <scenario> --seeds 1..50— Multi-seed statistical analysis
pnpm add @elata-biosciences/agentforgeRequirements: Node.js 18+ and Foundry with Anvil for EVM simulations.
# Initialize project structure
npx forge-sim init
# Run built-in toy scenario to verify setup
npx forge-sim run --toy
# Check environment
npx forge-sim doctorA scenario defines simulation parameters: seed, duration, agents, and assertions.
import { defineScenario } from '@elata-biosciences/agentforge';
import { ToyPack, RandomTraderAgent, MomentumAgent } from '@elata-biosciences/agentforge/toy';
export default defineScenario({
name: 'market-stress',
seed: 42,
ticks: 100,
tickSeconds: 3600,
pack: new ToyPack({
assets: [{ name: 'TOKEN', initialPrice: 100, volatility: 0.05 }],
initialCash: 10000,
}),
agents: [
{ type: RandomTraderAgent, count: 10 },
{ type: MomentumAgent, count: 5, params: { threshold: 0.02 } },
],
assertions: [
{ type: 'gt', metric: 'totalVolume', value: 0 },
{ type: 'gte', metric: 'successRate', value: 0.9 },
],
});Agents are autonomous actors that observe state and decide actions each tick.
import { BaseAgent, type Action, type TickContext } from '@elata-biosciences/agentforge';
export class MyAgent extends BaseAgent {
async step(ctx: TickContext): Promise<Action | null> {
// 30% chance to buy each tick
if (ctx.rng.chance(0.3)) {
return {
id: this.generateActionId('buy', ctx.tick),
name: 'buy',
params: { amount: ctx.rng.nextInt(1, 100), asset: 'TOKEN' },
};
}
return null; // Skip this tick
}
}Agents have access to:
ctx.rng— Deterministic random number generatorctx.world— Current protocol statethis.remember()/this.recall()— Persist state across ticksthis.setCooldown()/this.isOnCooldown()— Rate-limit actions
Packs are protocol adapters that set up blockchain state and handle contract interactions.
Same seed + same scenario = identical results. All randomness derives from seeded RNG.
# Verify determinism
forge-sim run --toy --seed 123 --out run1 --ci
forge-sim run --toy --seed 123 --out run2 --ci
forge-sim compare run1/toy-market-ci run2/toy-market-ci
# Should report identical artifact hashesAgentForge is particularly useful for stress-testing mechanism designs. See examples/mechanism-experiments/ for runnable examples:
Explore how transaction ordering affects value capture and leakage:
cd examples/mechanism-experiments/ordering-tax
npx forge-sim run scenario.ts --seed 42Questions it helps answer:
- How does priority ordering vs. random ordering affect searcher profits?
- What is the user slippage distribution under different ordering regimes?
- How do tail outcomes change across ordering policies?
Analyze how information timing affects auction outcomes:
cd examples/mechanism-experiments/timing-auction
npx forge-sim run scenario.ts --seed 42Questions it helps answer:
- How much advantage does a "fast actor" gain from late information?
- Does commit-reveal mitigate timing advantages?
- What is the impact on seller revenue and bidder participation?
forge-sim report sim/results/stress-ciProduces report.md with run metadata, KPI summary, time-series statistics, action analysis, and determinism fingerprint.
forge-sim compare baseline/stress-ci current/stress-ciProduces compare.md with metadata diff, KPI deltas, and behavioral changes.
forge-sim sweep sim/scenarios/stress.ts --seeds 1..50Runs the scenario with 50 different seeds and produces aggregate statistics: percentiles (P05/P50/P95), tail risk analysis, and per-seed summary CSV.
- name: Run simulations
run: npx forge-sim run sim/scenarios/stress.ts --ci --seed 42
- name: Upload artifacts
uses: actions/upload-artifact@v4
if: always()
with:
name: simulation-results
path: sim/results/Exit codes:
0— Success (all assertions passed)1— Assertion failure2— Infrastructure error
See docs/ci.md for detailed CI recipes.
forge-sim init [path] # Initialize simulation folder
forge-sim run <scenario> # Execute a scenario
forge-sim run --toy # Run built-in demo
forge-sim report <runDir> # Generate report from artifacts
forge-sim dashboard <runDir> # Generate static dashboard/ for a run
forge-sim serve <runDir> # Serve run dashboard over HTTP
forge-sim studio # Launch local Studio UI
forge-sim extract-agent <bundle> # Generate deterministic agent from replay_bundle.json
forge-sim compare <runA> <runB> # Compare two runs
forge-sim sweep <scenario> # Multi-seed statistical run
forge-sim matrix <scenario> # Multi-variant matrix run and comparison report
forge-sim doctor # Check dependencies
forge-sim types # Generate types from Foundry artifactsOptions for run:
--seed <n> # Override random seed
--ticks <n> # Override tick count
--out <dir> # Output directory
--mode <mode> # deterministic | exploration | replay
--replay-bundle <p> # Replay bundle path for mode=replay
--capture-memory # Persist agent memory snapshots to agent_memory.ndjson
--live # Enable live websocket event stream
--ci # CI mode (no colors, stable naming)
--verbose # Verbose logging
--json # Output results as JSONMode guidance:
deterministic: no live LLM calls, best for baseline and CI checksexploration: LLM-enabled red-team discovery, producesreplay_bundle.jsonreplay: deterministic re-run of prior exploration traces against updated contracts
PersonaLlmAgentBase provides a reusable base for persona-driven LLM agents with:
- persona profile (
id,style, goals, risk profile, tool preferences) - structured prompt assembly (persona + world + memory + capability manifest context)
- two-stage decision flow (plan -> action) with fallback to single-shot action parsing
- schema-validated action intents with optional persona metadata
Use it directly for custom agents, or subclass LlmPolicyAgent compatibility patterns.
import { PersonaLlmAgentBase, type PersonaProfile } from '@elata-biosciences/agentforge';Exploration-mode RpcCall autonomy supports two policies:
strict(default): requires explicit scenario allowlistaggressive: allows any non-empty RPC method when no explicit allowlist is configured
Controls:
- scenario config:
exploration.autonomousRpcPolicy = 'strict' | 'aggressive' - env override:
AGENTFORGE_AUTONOMOUS_RPC_POLICY=strict|aggressive - kill switch:
AGENTFORGE_DISABLE_AUTONOMOUS_RPC=1
Example:
exploration: {
allowArbitraryExecution: true,
autonomousRpcPolicy: 'aggressive',
disableAutonomousRpc: false,
allowlist: { allowedContracts: [], allowedRpcMethods: [] },
}Each TickContext can include a capabilities manifest (version, tools, query endpoints,
contracts, action templates). Packs can provide a rich manifest via getCapabilityManifest();
otherwise AgentForge builds a conservative fallback manifest from known tools/query endpoints.
// Core
import { defineScenario, BaseAgent, SimulationEngine } from '@elata-biosciences/agentforge';
import type { Scenario, Action, TickContext, Pack } from '@elata-biosciences/agentforge';
// Adapters
import { spawnAnvil, createViemClient } from '@elata-biosciences/agentforge/adapters';
// Toy simulation
import { ToyPack, RandomTraderAgent, MomentumAgent } from '@elata-biosciences/agentforge/toy';- Core Concepts — Scenarios, agents, ticks, packs, determinism
- CLI Reference — Full command/option coverage for every
forge-simcommand - CI Integration — GitHub Actions, GitLab CI, exit codes
- Reporting — Report, compare, and sweep commands
- LLM/Gossip Workflow — Exploration, replay, and information diffusion
- Portability — Using AgentForge in other repos and with different LLM providers
- Competitive Landscape — How AgentForge complements Foundry/Echidna and differs from dashboards
examples/basic-simulation/— Minimal setup with ToyPackexamples/custom-agent/— Memory, cooldowns, and parameterized behaviorexamples/assertions/— Assertion validation patternsexamples/metrics-tracking/— CSV analysis and statisticsexamples/mechanism-experiments/— Ordering and timing experiments
- Replay minimization: shrink failing replay bundles to minimal reproductions
- Extended ordering policies: custom ordering, bundle simulation
- Richer Studio analytics: larger-run workflows and stronger cross-run slicing
- Elata Protocol — App launchpad with bonding curves
See CONTRIBUTING.md.
MIT