You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched existing issues and this has not been proposed before
(Closest prior art, checked open and closed: #1631 covers the model half of this — workers inherit the parent model — and the #1744/#1717/#1716 cost-tiering stack fixes which model each role runs on, with zero mentions of effort. #1120 is review cadence. This proposal is the reasoning-effort dimension of the same cost problem; if maintainers consider it subsumed by the #1744 direction, happy to close.)
What problem does this solve?
Running an SDD-patterned pipeline on Claude Code, I noticed every subagent — implementer, spec reviewer, quality reviewer — runs at the session's effort configuration, because effort can't be set at dispatch: Claude Code's Task tool can override model per-invocation but not effort — the only per-subagent effort control is custom agent-definition frontmatter — and SDD dispatches via prompt templates on general-purpose agents.
The misallocation goes both ways. writing-plans produces 2–5 minute tasks with complete code in the plan; a high-effort session spends high-effort reasoning transcribing them (the effort-shaped twin of #1744's measured "most capable model for transcription-grade tasks" failure, which its model fix doesn't touch). Meanwhile a default-effort session runs the final whole-branch review — the dispatch #1744 deliberately pins to the most capable model — at default effort.
This is not Claude Code-specific; the harnesses just sit at different points:
Claude Code: effort (low/medium/high/xhigh/max; levels vary by model — Sonnet 4.6 has no xhigh, and unsupported levels fall back to the nearest supported level below). It is frontmatter-only, so per-role control requires pre-defined agent files. We measured the delivery path at the API request level (logging proxy, CC 2.1.176): subagent calls carry output_config.effort from the agent's frontmatter, overriding the session level (isolated: an effort: low agent under an effort-high session produced "effort":"low" on the wire), while the same calls are sent with thinking: {"type": "disabled"} — so prompt-level thinking cues are inert for subagents, and Haiku (which has no effort parameter) is left with no per-role reasoning control at all (see [FEATURE] enable extended thinking for subagents, currently not possible anthropics/claude-code#14321).
Codex: model_reasoning_effort (minimal/low/medium/high/xhigh) is a first-class per-custom-agent key in agent TOML files, inheriting unset keys from the parent session. The knob SDD would need already exists natively — it's simply unused.
Gemini CLI: Gemini 3's thinking_level (low/high/dynamic) exists at the API level, but the harness exposes reasoning depth at session/model-selection level; no per-role mechanism I could find.
SDD already encodes "capability should match the task" for models (Model Selection guidance, now the #1744 tiering). Effort is the same dial's second axis, currently welded to the session on every harness.
Proposed solution
Add an effort dimension to SDD's dispatch guidance, implemented per-harness where mechanisms exist:
Claude Code: ship effort-pinned agent definitions in the plugin's agents/ dir for the roles SDD uses (e.g. implementer-sonnet-med, quality-reviewer-opus-high), dispatched by name, with today's inline templates as the fallback path. Each file is ~15 lines of frontmatter plus the existing role prompt.
Codex: set model_reasoning_effort in the role's custom-agent config alongside the model choice the cost-tiering work already makes.
Other harnesses: no change until a per-role mechanism exists; optionally document the session-effort interaction ("SDD sessions at xhigh pay an effort tax on every micro-task").
Deliberately a sketch, not a design — the problem statement is the contribution; the right shape (which roles get which defaults, whether the plan's complexity tags drive it) belongs to the eval harness.
What alternatives did you consider?
Session-effort guidance only (docs, zero mechanism): can't express "low for implementers, high for final review" simultaneously, which is the actual win.
Do nothing: every long autonomous session keeps paying session-effort prices on transcription-grade tasks.
Is this appropriate for core Superpowers?
Yes with a caveat: the problem and guidance are harness-general and benefit any project type; the mechanisms are harness-conditional (first-class on Codex, agent files on Claude Code, unavailable elsewhere). Same shape as the existing per-harness model-selection guidance.
Environment (required)
Field
Value
Superpowers version
5.1.0 (installed, currently disabled — see Context for why experience comes from an SDD-patterned plugin instead)
Harness (Claude Code, Cursor, etc.)
Claude Code
Harness version
2.1.176
Your model + version
This issue was drafted by Claude Fable 5 (claude-fable-5, claude.ai), directed and reviewed by the human submitter. Sessions where the problem surfaced ran haiku, sonnet, opus, and fable subagents (agent-loops agent frontmatter); default session model claude-fable-5[1m] (1M-context variant)
Drafting disclosure: written collaboratively — Claude Fable 5 drafted and verified mechanics (docs + issue-tracker sweep); the human submitter directed scope and reviewed every claim. Experience base, stated plainly: this comes from building and running a private Claude Code plugin patterned on SDD (fresh subagent per task, staged reviews) where we solved the effort gap with role×model×effort agent definitions — not from running superpowers' SDD skill itself. Mechanics above were verified against current Claude Code and Codex documentation; the Claude Code effort/thinking delivery claims were additionally measured directly at the API request level with a logging proxy. We have not yet run a controlled cost/quality comparison of effort-pinned vs uniform-effort roles — happy to run one (same plan, both rosters, n≈4 per arm, pre-registered prediction) and bring the numbers if that would help decide whether this is worth pursuing.
(Closest prior art, checked open and closed: #1631 covers the model half of this — workers inherit the parent model — and the #1744/#1717/#1716 cost-tiering stack fixes which model each role runs on, with zero mentions of effort. #1120 is review cadence. This proposal is the reasoning-effort dimension of the same cost problem; if maintainers consider it subsumed by the #1744 direction, happy to close.)
What problem does this solve?
Running an SDD-patterned pipeline on Claude Code, I noticed every subagent — implementer, spec reviewer, quality reviewer — runs at the session's effort configuration, because effort can't be set at dispatch: Claude Code's Task tool can override
modelper-invocation but noteffort— the only per-subagent effort control is custom agent-definition frontmatter — and SDD dispatches via prompt templates on general-purpose agents.The misallocation goes both ways.
writing-plansproduces 2–5 minute tasks with complete code in the plan; a high-effort session spends high-effort reasoning transcribing them (the effort-shaped twin of #1744's measured "most capable model for transcription-grade tasks" failure, which its model fix doesn't touch). Meanwhile a default-effort session runs the final whole-branch review — the dispatch #1744 deliberately pins to the most capable model — at default effort.This is not Claude Code-specific; the harnesses just sit at different points:
effort(low/medium/high/xhigh/max; levels vary by model — Sonnet 4.6 has no xhigh, and unsupported levels fall back to the nearest supported level below). It is frontmatter-only, so per-role control requires pre-defined agent files. We measured the delivery path at the API request level (logging proxy, CC 2.1.176): subagent calls carryoutput_config.effortfrom the agent's frontmatter, overriding the session level (isolated: aneffort: lowagent under an effort-high session produced"effort":"low"on the wire), while the same calls are sent withthinking: {"type": "disabled"}— so prompt-level thinking cues are inert for subagents, and Haiku (which has no effort parameter) is left with no per-role reasoning control at all (see [FEATURE] enable extended thinking for subagents, currently not possible anthropics/claude-code#14321).model_reasoning_effort(minimal/low/medium/high/xhigh) is a first-class per-custom-agent key in agent TOML files, inheriting unset keys from the parent session. The knob SDD would need already exists natively — it's simply unused.thinking_level(low/high/dynamic) exists at the API level, but the harness exposes reasoning depth at session/model-selection level; no per-role mechanism I could find.SDD already encodes "capability should match the task" for models (Model Selection guidance, now the #1744 tiering). Effort is the same dial's second axis, currently welded to the session on every harness.
Proposed solution
Add an effort dimension to SDD's dispatch guidance, implemented per-harness where mechanisms exist:
agents/dir for the roles SDD uses (e.g.implementer-sonnet-med,quality-reviewer-opus-high), dispatched by name, with today's inline templates as the fallback path. Each file is ~15 lines of frontmatter plus the existing role prompt.model_reasoning_effortin the role's custom-agent config alongside the model choice the cost-tiering work already makes.Deliberately a sketch, not a design — the problem statement is the contribution; the right shape (which roles get which defaults, whether the plan's complexity tags drive it) belongs to the eval harness.
What alternatives did you consider?
efforton Claude Code's Task tool: would dissolve the problem cleanly, and is already requested upstream (Feature: effort/thinking configuration for Task tool subagents anthropics/claude-code#25669, #43083; #14321 is the thinking-flavored sibling) with no movement yet; agent files work today.Is this appropriate for core Superpowers?
Yes with a caveat: the problem and guidance are harness-general and benefit any project type; the mechanisms are harness-conditional (first-class on Codex, agent files on Claude Code, unavailable elsewhere). Same shape as the existing per-harness model-selection guidance.
Environment (required)
Context
Drafting disclosure: written collaboratively — Claude Fable 5 drafted and verified mechanics (docs + issue-tracker sweep); the human submitter directed scope and reviewed every claim. Experience base, stated plainly: this comes from building and running a private Claude Code plugin patterned on SDD (fresh subagent per task, staged reviews) where we solved the effort gap with role×model×effort agent definitions — not from running superpowers' SDD skill itself. Mechanics above were verified against current Claude Code and Codex documentation; the Claude Code effort/thinking delivery claims were additionally measured directly at the API request level with a logging proxy. We have not yet run a controlled cost/quality comparison of effort-pinned vs uniform-effort roles — happy to run one (same plan, both rosters, n≈4 per arm, pre-registered prediction) and bring the numbers if that would help decide whether this is worth pursuing.