[workflow-analysis] Weekly Workflow Analysis — 2026-05-04 #30128
Closed
Replies: 1 comment
-
|
This discussion was automatically closed because it expired on 2026-05-05T10:08:23.297Z.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Overview
This report covers 27 workflow runs across the past 7 days (2026-04-27 to 2026-05-04), totaling 4.5 hours of execution time and 37.4M tokens consumed at an estimated cost of $10.54. The overall success rate is 96.3% (26/27 completed), with 1 failure due to an infrastructure environment issue.
Key Metrics
Critical Issue: Daily News Failure
Workflow:
daily-news· Engine: copilot · Run: §25311165057Root cause: Node.js was not reachable inside the AWF chroot at runtime, despite being installed on the host runner. The
awf-agentcontainer uses chroot mode (chroot_mode: true), which requires Node.js to be present at a path that is bind-mounted into/host. The error logged:The
agentjob failed;upload_assets,detection,safe_outputs, andpush_repo_memoryjobs were all skipped as downstream dependents.Recommendation: Verify that the Node.js installation path (e.g.
/opt/hostedtoolcache/node/...) is correctly bind-mounted into/hostfor chroot-mode copilot runs. Consider adding a pre-flight check that validatesnodeavailability inside the chroot before starting the agent, with a descriptive failure message pointing to the setup action.Performance Analysis
Top Token Consumers (full table)
Longest-Running Workflows
Cache Effectiveness
Claude (excellent — 80–86% cache hit rate)
All 7 Claude runs with measurable tokens demonstrate outstanding prompt cache utilization, reducing effective token costs by 80–86%:
This means Claude workflows pay an ~84% discount on long multi-turn conversations — the caching is working exactly as designed.
Copilot (no caching; effective > raw tokens)
Copilot runs show zero prompt caching and a consistent ratio of effective_tokens/raw_tokens ≈ 1.11–1.17. This is expected behavior for copilot's token accounting (effective tokens include output weighting), but it means copilot conversations do not benefit from prefix caching the way Claude does. The one exception is Issue Monster, which has an unusually low ratio (~0.11), likely because it exits early after finding no issues.
Recommendation: For high-turn copilot workflows (Daily Syntax Error Quality Check at 112 turns, Contribution Check at 49 turns), consider whether the workflow can be restructured to reduce iteration count — e.g., batching checks per file group rather than one turn per file.
Optimization Opportunities
1. Daily Syntax Error Quality Check — High Turn Count
2. Daily AW Cross-Repo Compile Check — Long Duration
3. Copilot Session Insights — Wall Time Outlier
4. Release Workflow — Long Duration, Medium Turns
5. Ambient Context Inefficiency (Copilot)
cached_tokens: 0in their ambient contextReliability Metrics
Recommendations Summary
References:
Beta Was this translation helpful? Give feedback.
All reactions