Target Workflow: smoke-claude.md
**Source (redacted) Pre-downloaded run data from /tmp/gh-aw/token-audit/ (18 runs over 7 days)
Estimated cost per run: $0.071
Total tokens per run: ~185K
Cache read rate: 77% of total tokens
Cache write rate: 22% of total tokens
LLM turns: 6.8 average (range: 5–9)
Model: claude-haiku-4-5
Current Configuration
| Setting |
Value |
| Tools loaded |
github (pull_requests toolset), playwright (all tools), bash (all commands) |
| MCP tools actually used |
playwright: browser_navigate, browser_snapshot, browser_evaluate, browser_close, browser_wait_for |
| Network groups |
defaults, github, playwright |
| Pre-agent steps |
Yes (1: mkdir for playwright logs) |
| Prompt size |
~5,276 chars (incl. injected system block) |
Per-turn token breakdown (avg across 18 runs)
| Token type |
Average |
% of total |
| Cache read |
143,634 |
77% |
| Cache write |
39,402 |
22% |
| Input (new) |
38 |
<1% |
| Output |
1,520 |
<1% |
| Total |
184,594 |
100% |
Recommendations
1. Reduce Turn Count by Combining Operations in the Prompt
Estimated savings: ~25–35K tokens/run (~15–20%)
The workflow currently averages 6.8 turns for 4 tasks. The main driver is separate LLM turns for:
- GitHub MCP call (1 turn)
- Playwright navigate + snapshot (1–2 turns)
- Bash write + bash verify (1–2 turns)
- Safe output add_comment + add_label (1–2 turns)
The prompt should explicitly instruct the agent to batch operations per turn. Add to the prompt:
**Efficiency instructions:**
- Combine file creation and verification into a single bash call: `mkdir -p /tmp/gh-aw/agent && echo "..." > /file && cat /file`
- Use a single playwright turn: navigate AND verify the title in one step (do not take a separate snapshot unless navigate result is insufficient)
- Post the comment and label in a single combined safe-output invocation where possible
This alone could reduce turns from 6.8 → 5, saving ~2 turns × ~13K tokens/turn = ~26K tokens/run (~14%).
2. Replace Playwright Page-Title Check with Bash curl
Estimated savings: ~18–30K tokens/run (~10–16%)
The current smoke test uses Playwright to navigate to https://github.com and verify the page title contains "GitHub". This is trivially always true and consumes 1–2 LLM turns (navigate + snapshot). A curl HEAD check achieves the same signal at near-zero token cost:
Change in smoke-claude.md:
-2. **Playwright Testing**: Use playwright to navigate to https://github.com and verify the page title contains "GitHub"
+2. **Playwright Testing**: Use bash to run `curl -sI https://github.com | grep -i "content-type"` and verify the response contains `text/html`. This tests HTTPS connectivity without requiring a full browser session.
And remove the playwright tool and network group entirely:
network:
allowed:
- defaults
- github
- - playwright
tools:
github:
toolsets: [pull_requests]
- playwright:
bash:
- "*"
⚠️ Tradeoff: This changes the nature of the test from browser-based to HTTP-only. If the intent is specifically to validate the Playwright MCP server, keep Playwright but apply recommendation #1 to reduce its turn count.
If Playwright is kept, at minimum ensure the test uses only 1 turn:
2. **Playwright Testing**: Navigate to https://github.com using `playwright_browser_navigate`. The tool returns the page title — verify it contains "GitHub" from the navigate response directly. Do NOT call browser_snapshot separately.
3. Add Explicit Tool Use Guidance to Reduce Extraneous MCP Calls
Estimated savings: ~5–10K tokens/run (~3–5%)
From the run data, playwright_browser_snapshot is called in only 7/18 runs — indicating inconsistent agent behavior. Explicit instructions prevent unnecessary tool calls:
Add to the prompt body:
**Tool constraints:**
- GitHub: use `list_pull_requests` with `perPage: 2, state: closed` only — do not fetch PR details or reviews
- Playwright: call `browser_navigate` only — do not call `browser_snapshot` unless navigate returns no title
- Bash: chain commands with `&&` to minimize turns
4. Pre-Compute the Playwright Log Directory in the Existing Step
Estimated savings: Negligible tokens, but removes runtime ambiguity
The current steps: already creates the playwright log dir. This is correct. No change needed here — it's already optimized.
Cache Analysis (Anthropic-Specific)
| Metric |
Value |
| Avg cache write/run |
39,402 tokens |
| Avg cache read/run |
143,634 tokens |
| Cache read : write ratio |
~3.6× |
| Runs analyzed |
18 |
Cache write amortization: The 39K cache write on turn 1 (system prompt + tool schemas) is reused across ~6.8 turns of cache reads per run. At a 3.6× read:write ratio, the cache pays for itself within ~2 turns — caching is working well here.
Cache cost vs benefit: For Haiku pricing, cache writes are ~1.25× the cost of regular input tokens. With 6.8 turns reusing the cache, each write token is read ~3.6× — the savings are real and caching should be kept.
Cache hit consistency: 77% cache read rate is healthy. The run-to-run variation in total tokens (119K–325K) is driven by turn count variation (5–9 turns), not cache misses.
What drives cache write size (~39K)? The system prompt (security policy, safe-outputs instructions, tool schemas) accounts for the bulk. This cannot be reduced in the workflow file itself — it's framework-injected.
Expected Impact
| Metric |
Current |
Projected |
Savings |
| Total tokens/run |
~185K |
~130–140K |
~25–30% |
| Cost/run |
$0.071 |
$0.050–0.053 |
~25–30% |
| LLM turns |
6.8 |
5.0 |
−1.8 turns |
| Session time |
~3.6m |
~2.8m (est.) |
~20% |
| Monthly cost (18 runs/week) |
~$20.50 |
~$14–15 |
~$5–6/mo |
Implementation Checklist
Generated by Daily Claude Token Optimization Advisor · ● 788.5K · ◷
Target Workflow:
smoke-claude.md**Source (redacted) Pre-downloaded run data from
/tmp/gh-aw/token-audit/(18 runs over 7 days)Estimated cost per run: $0.071
Total tokens per run: ~185K
Cache read rate: 77% of total tokens
Cache write rate: 22% of total tokens
LLM turns: 6.8 average (range: 5–9)
Model:
claude-haiku-4-5Current Configuration
github(pull_requests toolset),playwright(all tools),bash(all commands)playwright:browser_navigate,browser_snapshot,browser_evaluate,browser_close,browser_wait_fordefaults,github,playwrightPer-turn token breakdown (avg across 18 runs)
Recommendations
1. Reduce Turn Count by Combining Operations in the Prompt
Estimated savings: ~25–35K tokens/run (~15–20%)
The workflow currently averages 6.8 turns for 4 tasks. The main driver is separate LLM turns for:
The prompt should explicitly instruct the agent to batch operations per turn. Add to the prompt:
This alone could reduce turns from 6.8 → 5, saving ~2 turns × ~13K tokens/turn = ~26K tokens/run (~14%).
2. Replace Playwright Page-Title Check with Bash
curlEstimated savings: ~18–30K tokens/run (~10–16%)
The current smoke test uses Playwright to navigate to
https://github.comand verify the page title contains "GitHub". This is trivially always true and consumes 1–2 LLM turns (navigate + snapshot). AcurlHEAD check achieves the same signal at near-zero token cost:Change in
smoke-claude.md:And remove the
playwrighttool and network group entirely:network: allowed: - defaults - github - - playwright tools: github: toolsets: [pull_requests] - playwright: bash: - "*"If Playwright is kept, at minimum ensure the test uses only 1 turn:
3. Add Explicit Tool Use Guidance to Reduce Extraneous MCP Calls
Estimated savings: ~5–10K tokens/run (~3–5%)
From the run data,
playwright_browser_snapshotis called in only 7/18 runs — indicating inconsistent agent behavior. Explicit instructions prevent unnecessary tool calls:Add to the prompt body:
4. Pre-Compute the Playwright Log Directory in the Existing Step
Estimated savings: Negligible tokens, but removes runtime ambiguity
The current
steps:already creates the playwright log dir. This is correct. No change needed here — it's already optimized.Cache Analysis (Anthropic-Specific)
Cache write amortization: The 39K cache write on turn 1 (system prompt + tool schemas) is reused across ~6.8 turns of cache reads per run. At a 3.6× read:write ratio, the cache pays for itself within ~2 turns — caching is working well here.
Cache cost vs benefit: For Haiku pricing, cache writes are ~1.25× the cost of regular input tokens. With 6.8 turns reusing the cache, each write token is read ~3.6× — the savings are real and caching should be kept.
Cache hit consistency: 77% cache read rate is healthy. The run-to-run variation in total tokens (119K–325K) is driven by turn count variation (5–9 turns), not cache misses.
What drives cache write size (~39K)? The system prompt (security policy, safe-outputs instructions, tool schemas) accounts for the bulk. This cannot be reduced in the workflow file itself — it's framework-injected.
Expected Impact
Implementation Checklist
smoke-claude.mdprompt body (Rec Improve links in readme to AW project #1): combine bash ops, single playwright turngh aw compile .github/workflows/smoke-claude.mdnpx tsx scripts/ci/postprocess-smoke-workflows.tsagent_usage.jsonin logs)