Motivation
Lab journals 09 and 10 found two recurring classes of conclave failure that simpler validation would have caught at $0:
- Writer summary hallucinates blocker counts (journal 09 run 150, journal 10 run 157 — "3 blockers" / "4 blockers" in the summary string while the body lists different numbers).
- Writer passes through specialist contradictions (journal 10: Security flagged absolute-path traversal as a blocker while Conventions praised the same code as correctly sandboxed, in the same run).
Both are structural defects the conclave can't detect today because agents emit free-form text and the Writer has no schema to validate against. See .notes/09-runtime-ts-review-matrix-and-kb-paradox.md and .notes/10-builtin-tools-review-matrix.md for full context.
What Anthropic gives us
Anthropic shipped structured outputs GA across Opus 4.6 / Sonnet 4.6 / Sonnet 4.5 / Haiku 4.5 (Claude API + Bedrock). Two complementary features:
-
JSON outputs (output_config.format on the core Messages API; outputFormat on query() in the Claude Agent SDK). The model physically cannot emit tokens that violate the schema — constrained decoding. Agent can still use tools during reasoning; the final result is the schema-valid JSON.
-
Strict tool use (strict: true on tool definitions). When the model calls a tool, parameters are guaranteed to match the tool's input_schema.
Both are schema-driven. Both remove retries/parsing failures. TypeScript has first-class helpers: zodOutputFormat(), jsonSchemaOutputFormat(), and z.toJSONSchema() from the SDK.
Docs:
Goal
Let conclave authors define structured outputs on agent nodes via a form in the editor, with no TypeScript written by the author. The engine threads the schema through to the Claude SDK's outputFormat option, returns validated JSON to downstream nodes, and — for multiple sibling specialists feeding one downstream node — runs a cheap comparison check to detect contradictions.
This is reuse of existing primitives: no new node type, no new edge semantics, no new framework. The agent inspector gets one new tab; the Claude path reads one new config field; MCP layer is untouched.
Out of scope (explicitly)
- No custom "add_finding tool" per-agent. The Agent SDK's native
outputFormat subsumes that design — agents use existing tools freely during reasoning, the SDK enforces the final shape.
- No new "Gate" or "Blocker" node type. Contradiction detection is an engine feature wired automatically when sibling specialists feed one downstream node, not a node authors manually add.
- No LangChain-style code-first framework. Authors never open a code editor to use this.
Phase 0: Verification (before shipping anything)
Phase 1: UI — Structured Output as a droppable internal tool
Scope: packages/client only. No runtime behavior change yet.
UX model: structured output is opt-in, added by dragging a "Structured Output" entry from the node palette onto an agent node — same gesture as adding a built-in tool (Read, Write, Bash) today. Agents that don't need structured output don't see any schema config cluttering their inspector. When the "Structured Output" chip is present on an agent, the inspector surfaces a field builder; when it's absent, the inspector is unchanged.
This is not a graph node — it's a chip/slot that lives inside an agent node, like the existing built-in-tool chips.
Rationale for chip-based UX over always-on tab: matches existing palette-driven composition (built-in tools, MCP servers already work this way); keeps the inspector uncluttered for agents that emit prose; makes structured output visible in the graph itself, not buried in a tab. No new concepts — it's a capability you add, just like adding Read or Bash.
Acceptance: user drops a Structured Output chip on a conclave-28 specialist via the palette, defines a few fields in the inspector, saves the conclave, and sees outputSchema populated in oc-dev get_conclave output. Removing the chip clears the schema. Engine still ignores the schema at this phase — validate the UX is sound before the backend lights up.
Phase 2: Server — wire into the Claude path
Scope: packages/server/src/agent/runtime.ts only.
const agentQuery = query({
prompt,
options: {
...existingOptions,
outputFormat: { type: "json_schema", schema: config.outputSchema },
},
});
Acceptance: a conclave-28 specialist with a schema defined in Phase 1 runs, and the Writer receives structured JSON from each specialist instead of free-text review prose.
Phase 3: Writer schema + default templates
Scope: config-only. No code change beyond seeding a schema into conclave #28's Writer node.
Acceptance: rerun the journal-09 and journal-10 experiments. The Writer summary can no longer say "3 blockers" while the body has 0. Goal: zero count-hallucinations across at least 5 runs.
Phase 4: Cross-specialist contradiction detection
Scope: packages/server/src/engine/agent-executor.ts + a small utility module.
Acceptance: rerun journal-10 builtin-tools matrix. The contradiction between Security ("path traversal blocker") and Conventions ("workspace path resolution prevents path traversal") surfaces as an explicit contradictions entry the Writer sees — not silently accepted.
Phase 5: Ollama / OpenAI parity
Scope: packages/server/src/agent/{ollama,openai-chat,openai-responses}.ts and AgentBase.
Acceptance: same Phase-3 experiment works with an Ollama-backed specialist. May have higher error rate; that's expected and documented.
What this does not solve
- Semantic errors: journal-10's "Bash Command Injection is a blocker" false positive was a threat-model misread, not a schema violation. Structured outputs don't catch wrong judgments — only wrong shapes. That still needs a cross-specialist critic pass (out of scope here, tracked separately).
Links
Motivation
Lab journals 09 and 10 found two recurring classes of conclave failure that simpler validation would have caught at $0:
Both are structural defects the conclave can't detect today because agents emit free-form text and the Writer has no schema to validate against. See
.notes/09-runtime-ts-review-matrix-and-kb-paradox.mdand.notes/10-builtin-tools-review-matrix.mdfor full context.What Anthropic gives us
Anthropic shipped structured outputs GA across Opus 4.6 / Sonnet 4.6 / Sonnet 4.5 / Haiku 4.5 (Claude API + Bedrock). Two complementary features:
JSON outputs (
output_config.formaton the core Messages API;outputFormatonquery()in the Claude Agent SDK). The model physically cannot emit tokens that violate the schema — constrained decoding. Agent can still use tools during reasoning; the final result is the schema-valid JSON.Strict tool use (
strict: trueon tool definitions). When the model calls a tool, parameters are guaranteed to match the tool'sinput_schema.Both are schema-driven. Both remove retries/parsing failures. TypeScript has first-class helpers:
zodOutputFormat(),jsonSchemaOutputFormat(), andz.toJSONSchema()from the SDK.Docs:
Goal
Let conclave authors define structured outputs on agent nodes via a form in the editor, with no TypeScript written by the author. The engine threads the schema through to the Claude SDK's
outputFormatoption, returns validated JSON to downstream nodes, and — for multiple sibling specialists feeding one downstream node — runs a cheap comparison check to detect contradictions.This is reuse of existing primitives: no new node type, no new edge semantics, no new framework. The agent inspector gets one new tab; the Claude path reads one new config field; MCP layer is untouched.
Out of scope (explicitly)
outputFormatsubsumes that design — agents use existing tools freely during reasoning, the SDK enforces the final shape.Phase 0: Verification (before shipping anything)
@anthropic-ai/claude-agent-sdk@0.2.91supportsoptions.outputFormatonquery(). If not, bump the SDK. Inspect theOptionstype innode_modules/@anthropic-ai/claude-agent-sdk/dist/index.d.ts.tool()helper passesstrict: truethrough (secondary priority — phase 2+ only).zod ^4.0.0as peer dep; we're onzod ^3.24. Pre-existing warning, not blocking here. File separately if it bites.Phase 1: UI — Structured Output as a droppable internal tool
Scope:
packages/clientonly. No runtime behavior change yet.UX model: structured output is opt-in, added by dragging a "Structured Output" entry from the node palette onto an agent node — same gesture as adding a built-in tool (Read, Write, Bash) today. Agents that don't need structured output don't see any schema config cluttering their inspector. When the "Structured Output" chip is present on an agent, the inspector surfaces a field builder; when it's absent, the inspector is unchanged.
This is not a graph node — it's a chip/slot that lives inside an agent node, like the existing built-in-tool chips.
outputSchema?: JsonSchemaObjecttoResolvedAgentConfiginpackages/shared/src/schemas/agent.ts. Staysundefinedwhen no Structured Output chip is attached; becomes the user-built schema when one is.packages/client/src/components/editor/node-palette.tsx), in the same section as built-in tools. Icon: something schema-ish (ListChecksorBracesfrom Lucide).outputSchema: { type: "object", properties: {} }(empty schema, valid-but-useless starting state). The agent inspector grows a new section revealing the field builder. Removing the chip setsoutputSchema: undefined.outputSchemais set):name·type(string/integer/number/boolean/enum) ·requiredcheckbox ·description· constraints (min/max/minLength/maxLength/enum values)Rationale for chip-based UX over always-on tab: matches existing palette-driven composition (built-in tools, MCP servers already work this way); keeps the inspector uncluttered for agents that emit prose; makes structured output visible in the graph itself, not buried in a tab. No new concepts — it's a capability you add, just like adding Read or Bash.
Acceptance: user drops a Structured Output chip on a conclave-28 specialist via the palette, defines a few fields in the inspector, saves the conclave, and sees
outputSchemapopulated inoc-dev get_conclaveoutput. Removing the chip clears the schema. Engine still ignores the schema at this phase — validate the UX is sound before the backend lights up.Phase 2: Server — wire into the Claude path
Scope:
packages/server/src/agent/runtime.tsonly.runClaudeAgent, ifconfig.outputSchemais present, pass it through:result.structured_output(the validated JSON). If present, setresultOutputto the structured JSON serialized as a string (edges currently carry strings — keep the data model unchanged for now).error_max_structured_output_retries— treat as agent failure with a clear error message. Emit viaonOutputfor visibility.resultOutputprose is no longer propagated; only the structured JSON. Document this in the inspector UI.Acceptance: a conclave-28 specialist with a schema defined in Phase 1 runs, and the Writer receives structured JSON from each specialist instead of free-text review prose.
Phase 3: Writer schema + default templates
Scope: config-only. No code change beyond seeding a schema into conclave #28's Writer node.
CodeReviewFindingsschema:{ "type": "object", "properties": { "findings": { "type": "array", "items": { "type": "object", "properties": { "severity": { "enum": ["blocker", "major", "minor", "nit"] }, "file": { "type": "string" }, "line": { "type": "integer", "minimum": 1 }, "description": { "type": "string" }, "raisedBy": { "type": "string" } }, "required": ["severity", "file", "line", "description", "raisedBy"] } }, "counts": { "type": "object", "properties": { "blocker": { "type": "integer", "minimum": 0 }, "major": { "type": "integer", "minimum": 0 }, "minor": { "type": "integer", "minimum": 0 }, "nit": { "type": "integer", "minimum": 0 } }, "required": ["blocker", "major", "minor", "nit"] } }, "required": ["findings", "counts"] }light_code_reviewspecialists AND the Writer to use this schema.counts.blocker === findings.filter(f => f.severity === "blocker").lengthetc. If mismatch, emit a warning. With structured output this can only fail if the model hallucinates — it's belt-and-suspenders.Acceptance: rerun the journal-09 and journal-10 experiments. The Writer summary can no longer say "3 blockers" while the body has 0. Goal: zero count-hallucinations across at least 5 runs.
Phase 4: Cross-specialist contradiction detection
Scope:
packages/server/src/engine/agent-executor.ts+ a small utility module.outputSchemadefined with compatible shapes, the engine accumulates their structured outputs into a shared bucket keyed by the downstream node.(file, line)key with differentseverity→ contradiction → emit event; mark the input with acontradictions: [...]field so the downstream agent sees it as data.descriptionstrings when(file, line)matches loosely — flags "almost-same finding, different wording".Acceptance: rerun journal-10 builtin-tools matrix. The contradiction between Security ("path traversal blocker") and Conventions ("workspace path resolution prevents path traversal") surfaces as an explicit
contradictionsentry the Writer sees — not silently accepted.Phase 5: Ollama / OpenAI parity
Scope:
packages/server/src/agent/{ollama,openai-chat,openai-responses}.tsandAgentBase.format: jsonSchemain newer versions. Translateconfig.outputSchemato the Ollama request shape. No constrained-decoding guarantee — validate the response with Zod post-hoc, retry up to N times with the validation error fed back.response_format: { type: "json_schema", json_schema: {...} }. Analogous implementation.Acceptance: same Phase-3 experiment works with an Ollama-backed specialist. May have higher error rate; that's expected and documented.
What this does not solve
Links
.notes/09-runtime-ts-review-matrix-and-kb-paradox.md.notes/10-builtin-tools-review-matrix.md