[workflow-analysis] Weekly Workflow Analysis — 2026-05-04 #30128

2026-05-04T10:08:23Z

github-actions[bot]
Bot May 4, 2026

Overview

This report covers 27 workflow runs across the past 7 days (2026-04-27 to 2026-05-04), totaling 4.5 hours of execution time and 37.4M tokens consumed at an estimated cost of $10.54. The overall success rate is 96.3% (26/27 completed), with 1 failure due to an infrastructure environment issue.

Key Metrics

Metric	Value
Total runs	27
Successful	23
Failed	1
In progress / queued	3
Success rate	96.3%
Total wall time	4.5 hours
Action minutes	287 min
Total tokens	37.4M
Effective tokens (post-cache)	24.3M
Overall cache hit rate	35.1%
Total estimated cost	$10.54
Total turns	713
Engine breakdown	copilot: 16 · claude: 9 · codex: 1 · pi: 1

Critical Issue: Daily News Failure

Workflow: daily-news · Engine: copilot · Run: §25311165057

Root cause: Node.js was not reachable inside the AWF chroot at runtime, despite being installed on the host runner. The awf-agent container uses chroot mode (chroot_mode: true), which requires Node.js to be present at a path that is bind-mounted into /host. The error logged:

[entrypoint][ERROR] Copilot CLI requires Node.js, but 'node' is not available inside AWF chroot.
[entrypoint][ERROR] Ensure Node.js is installed on the runner and reachable from PATH inside the chroot.

The agent job failed; upload_assets, detection, safe_outputs, and push_repo_memory jobs were all skipped as downstream dependents.

Recommendation: Verify that the Node.js installation path (e.g. /opt/hostedtoolcache/node/...) is correctly bind-mounted into /host for chroot-mode copilot runs. Consider adding a pre-flight check that validates node availability inside the chroot before starting the agent, with a descriptive failure message pointing to the setup action.

Performance Analysis

Top Token Consumers (full table)

Workflow	Engine	Tokens	Effective	Cache	Turns	Duration
Daily Syntax Error Quality Check	copilot	6.77M	7.50M	—	112	14.1m
Daily AW Cross-Repo Compile Check	claude	4.30M	642K	85%	73	19.0m
[aw] Failure Investigator (6h)	claude	3.51M	546K	84%	59	11.5m
Instructions Janitor	claude	3.48M	591K	83%	66	11.8m
daily-experiment-report	copilot	3.02M	3.34M	—	48	9.4m
Copilot Session Insights	claude	2.32M	377K	84%	44	38.1m
Daily Rendering Scripts Verifier	claude	2.19M	316K	86%	42	9.9m
Go Fan	claude	1.73M	339K	80%	40	10.2m
Daily MCP Tool Concurrency Analysis	copilot	1.55M	1.72M	—	26	4.7m
Layout Specification Maintainer	copilot	1.51M	1.68M	—	29	7.5m

Longest-Running Workflows

Workflow	Engine	Duration	Turns
Copilot Session Insights	claude	38.1m	44
Release	copilot	35.6m	18
Daily AW Cross-Repo Compile Check	claude	19.0m	73
Daily Syntax Error Quality Check	copilot	14.1m	112
Organization Health Report	copilot	13.8m	18
AI Moderator	codex	13.3m	0
Instructions Janitor	claude	11.8m	66
[aw] Failure Investigator (6h)	claude	11.5m	59
Go Fan	claude	10.2m	40
Daily Rendering Scripts Verifier	claude	9.9m	42

Cache Effectiveness

Claude (excellent — 80–86% cache hit rate)

All 7 Claude runs with measurable tokens demonstrate outstanding prompt cache utilization, reducing effective token costs by 80–86%:

Workflow	Raw Tokens	Effective	Cache %
Daily Rendering Scripts Verifier	2.19M	316K	86%
Daily AW Cross-Repo Compile Check	4.30M	642K	85%
Design Decision Gate	217K	35K	84%
[aw] Failure Investigator	3.51M	546K	84%
Copilot Session Insights	2.32M	377K	84%
Instructions Janitor	3.48M	591K	83%
Go Fan	1.73M	339K	80%

This means Claude workflows pay an ~84% discount on long multi-turn conversations — the caching is working exactly as designed.

Copilot (no caching; effective > raw tokens)

Copilot runs show zero prompt caching and a consistent ratio of effective_tokens/raw_tokens ≈ 1.11–1.17. This is expected behavior for copilot's token accounting (effective tokens include output weighting), but it means copilot conversations do not benefit from prefix caching the way Claude does. The one exception is Issue Monster, which has an unusually low ratio (~0.11), likely because it exits early after finding no issues.

Recommendation: For high-turn copilot workflows (Daily Syntax Error Quality Check at 112 turns, Contribution Check at 49 turns), consider whether the workflow can be restructured to reduce iteration count — e.g., batching checks per file group rather than one turn per file.

Optimization Opportunities

1. Daily Syntax Error Quality Check — High Turn Count

112 turns consuming 6.77M tokens in 14.1 minutes
Effective token count exceeds raw (7.50M vs 6.77M) — no caching benefit
At ~7.9 turns/min this is the most "chatty" workflow in the fleet
Action: Profile whether the 112 turns represent one check per file. If so, batch files into groups of 5–10 per turn to reduce context switching overhead.

2. Daily AW Cross-Repo Compile Check — Long Duration

19.0 minutes and 73 turns for a cross-repo compile validation
Even with 85% caching, 4.3M raw tokens is significant
Action: Investigate whether compile checks can be parallelized across repos or limited to recently-changed packages.

3. Copilot Session Insights — Wall Time Outlier

38.1 minutes is the longest run, consuming 2.32M tokens
Despite strong Claude caching (84%), the long wall time suggests waiting on external API calls
Action: Add timing annotations to identify which steps contribute most to elapsed time.

4. Release Workflow — Long Duration, Medium Turns

35.6 minutes but only 18 turns — low turn density (0.5 t/min)
This suggests the time is spent waiting (build steps, CI checks, API polling), not in agent iterations
Action: Audit whether release steps can be parallelized or whether blocking waits can be replaced with polling loops with exponential backoff.

5. Ambient Context Inefficiency (Copilot)

All copilot runs report cached_tokens: 0 in their ambient context
Ambient context averages 35–45K tokens per copilot run with 0% reuse
Action: Verify whether copilot ambient context (repo instructions, skill files) is structured for cache-prefix stability. Placing the most stable content first in the context improves the chances of prefix reuse.

Reliability Metrics

96.3% success rate (26/27) — healthy
1 error (Daily News Node.js chroot issue) — infrastructure, not logic
0 warnings, 0 missing tools, 0 missing data calls
3 safe output items produced across the run window
114 GitHub API calls total; core quota usage was well within limits (14,296/15,000 remaining after the analyzed runs)

Recommendations Summary

Priority	Action	Impact
🔴 High	Fix Node.js availability in AWF chroot for copilot runs	Eliminates Daily News failures
🟡 Medium	Batch file iterations in Daily Syntax Error Quality Check	Reduce 112 → ~20 turns, cut token usage ~80%
🟡 Medium	Add pre-flight chroot Node.js validation step	Fail fast with clear diagnostics
🟢 Low	Investigate Release workflow long wait times	Reduce 35min → potentially <15min
🟢 Low	Enable copilot ambient context prefix caching	Reduce per-run overhead of 35–45K ambient tokens
🟢 Low	Investigate Copilot Session Insights 38min wall time	Identify blocking steps

References:

§25311165057 — Daily News failure
§25312765244 — This analysis run

Generated by Weekly Workflow Analysis · ● 258.7K · ◷

expires on May 5, 2026, 10:08 AM UTC

2026-05-05T11:05:21Z

github-actions[bot]
Bot May 5, 2026
Author

This discussion was automatically closed because it expired on 2026-05-05T10:08:23.297Z.

Closed by Workflow

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[workflow-analysis] Weekly Workflow Analysis — 2026-05-04 #30128

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[workflow-analysis] Weekly Workflow Analysis — 2026-05-04 #30128

Uh oh!

github-actions[bot] Bot May 4, 2026

Overview

Key Metrics

Critical Issue: Daily News Failure

Performance Analysis

Cache Effectiveness

Claude (excellent — 80–86% cache hit rate)

Copilot (no caching; effective > raw tokens)

Optimization Opportunities

1. Daily Syntax Error Quality Check — High Turn Count

2. Daily AW Cross-Repo Compile Check — Long Duration

3. Copilot Session Insights — Wall Time Outlier

4. Release Workflow — Long Duration, Medium Turns

5. Ambient Context Inefficiency (Copilot)

Reliability Metrics

Recommendations Summary

Replies: 1 comment

Uh oh!

github-actions[bot] Bot May 5, 2026 Author

github-actions[bot]
Bot May 4, 2026

github-actions[bot]
Bot May 5, 2026
Author