AI Symphony

A musical benchmark for AI coding agents.

Traditional benchmarks ask: "Did the agent pass?" AI Symphony asks: "Can you hear where the agent forgot its instructions?"

Every rule, skill, memory entry, and MCP capability of an AI coding agent is mapped to a single note. When the agent correctly activates a capability in the right context, the note plays. When it forgets, the note goes silent. The result is an audible signature of the agent's steering reliability.

The richer and more complex the melody an agent can play without dropping a note, the stronger its harness.

This repo is the first PoC. It includes:

A 9-note "Common Score" — the Ode to Joy main theme.
A protocol spec (SPEC.md) defining note tokens, scoring, and adapter contracts.
An automated Claude Code adapter that runs a real claude -p test and emits a .wav you can play.
A Cursor adapter consisting of 9 .mdc rule files plus a manual runbook (Cursor is GUI-driven; v0 collects its output by hand).

Hear it

python -m harness_symphony.run claude-code
# writes out/claude-code-<timestamp>.wav
afplay out/claude-code-*.wav   # macOS

A perfect run plays the first phrase of Ode to Joy:

E E F G | G F E D | C

Each missing capability is replaced by silence at that beat. A fragile agent sounds like a music box with broken teeth.

The 9-note Common Score

#	Pitch	Capability layer	Trigger
1	E4	always rule	every response
2	E4	project / repo-global instruction	project identity
3	F4	file-glob rule	`*/.py`
4	G4	file-glob rule	`*/.tsx`, `*/.jsx`
5	G4	file-glob rule	`*/.nf`, `nextflow.config`
6	F4	path-glob rule	`/auth/`, `/security/`
7	E4	agent-requested rule	QMS / verification topics
8	D4	agent-requested rule	performance / optimization topics
9	C4	manual rule	`@hotfix` in prompt

Each rule instructs the agent: "When you are active, print <NOTE:NAME> exactly once." The harness parses the agent's output for these tokens, renders the WAV, and reports a coverage score.

See SPEC.md for the full protocol — token format, scoring rules, and how to add new adapters.

Levels

Level	Form	What it measures
1 — Scale Test	Each rule fired alone, one at a time	Per-capability availability
2 — Melody Test	All 9 rules engaged in one prompt (this repo's v0)	Steering reliability under load
3 — Symphony Test	Multiple capability layers (rules + skills + memory + MCP) running concurrently across multi-turn workflows	End-to-end harness coherence

This PoC ships Level 2 for the Common Score. Per-agent Extended Scores (each agent's full capability surface — Cursor's auto-attach, Claude Code's skills/hooks, Antigravity/Kiro steering, etc.) come next.

Why this is interesting

A standard benchmark gives you a number. AI Symphony gives you a signal you can listen to — a continuous, human-perceptible representation of which parts of an agent's steering layer fired when they should have. Drop-outs are obvious. Stable runs sound stable. The same .wav file is shareable, embeddable, and survives the screenshot-doesn't-tell-the-whole-story problem that plagues agent demos.

It also separates two things that benchmarks usually conflate:

Capability gap — the agent doesn't have this layer (it's a missing instrument, not a wrong note).
Reliability failure — the agent has the layer but didn't fire it (the instrument was there but skipped a beat).

The Common Score is designed so every modern agent has every instrument, so a missing note in Common Score is unambiguously a reliability failure.

Status

PoC v0. Claude Code adapter is automated; Cursor adapter ships rule files + a manual runbook. Not a stable API yet — the SPEC will iterate.

Internal technical name for the protocol is Harness Symphony Protocol. The public/demo brand is AI Symphony.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
bin		bin
discovery/reports		discovery/reports
harness_symphony		harness_symphony
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SPEC.md		SPEC.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Symphony

Hear it

The 9-note Common Score

Levels

Why this is interesting

Status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Symphony

Hear it

The 9-note Common Score

Levels

Why this is interesting

Status

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages