You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The 1,000 most recent Copilot-created PRs were clustered using TF-IDF vectorization on PR titles and bodies, with k=8 selected via silhouette scoring. The repository shows consistent high-velocity agentic work — averaging 50 PRs/day — spanning eight distinct task themes dominated by general GitHub/workflow improvements (33%), documentation/testing tasks (24%), and MCP-related refactors (11%).
### Cluster Analysis — 8 Task Themes
All clusters had healthy merge rates (68–82%). Notable patterns below.
The largest cluster covers broad workflow-level improvements: runner config, tag pinning, action version updates, and agentic workflow compilation fixes. Mixed bag of bug fixes and maintenance with solid, if slightly lower, merge rate.
Top terms: workflow, github, agentic, request, pull Top categories: Bug Fix (34%), Other (33%), Add (14%) Example PRs: #28150, #28533, #29031
Second-largest cluster covers test coverage additions, documentation improvements, cache/memory handling, and Copilot prompt/instruction tuning. High merge rate signals these are well-understood, incremental improvements.
Top terms: test, docs, cache, memory, copilot, perf Top categories: Bug Fix (43%), Other (33%), Docs (8%) Example PRs: #28956, #28161, #26507
MCP-focused work: fixing MCP config formats, refactoring validators, resolving gateway auth issues, and separating GitHub-specific logic. The high refactor proportion (37%) indicates active architectural evolution of the MCP layer.
Top terms: mcp, refactor, gateway, github, shared Top categories: Refactor (37%), Bug Fix (24%), Other (20%) Example PRs: #27722, #26585, #27102
CLI consistency improvements, help-text normalization, version bumps, and MCP CLI bridge enhancements. High merge rate and small-to-medium change sets suggest reliable, well-scoped tasks.
Top terms: cli, mcp, copilot, help, claude, bump Top categories: Bug Fix (35%), Chore/Deps (18%), Other (16%) Example PRs: #28842, #26715, #26558
Cluster 5 — Feature Development (71 PRs, 76% merged)
Pure feature additions tagged with feat: — new audit commands, schema extensions, experiment infrastructure, and engine capabilities. Slightly lower merge rate than bug-fix clusters, consistent with higher complexity.
Top terms: feat, audit, schema, command, experiments, engine Top categories: Feature (99%) Example PRs: #28913, #29783, #26594
Cluster 6 — Safe-Outputs System (65 PRs, 68% merged)
The lowest merge rate cluster — safe-outputs validation, noop guidance, manifest alignment, and pull-request-level output constraints. The lower success rate may reflect the complexity of coordinating output semantics across many workflows.
Top terms: safe, outputs, output, pull, request Top categories: Bug Fix (29%), Other (23%), Add (17%) Example PRs: #27479, #29270, #29269
feat: add hidden experiments command to read experiment state
Feature
✅ merged
+1045/-1
Key Findings
High throughput, strong success rate: 1,000 PRs in ~20 days at 77.5% overall merge rate, with most clusters achieving 75–82%. The agentic workflow system is operating at scale with high quality.
WIP/Investigation tasks are outliers: 42 PRs tagged [WIP] had only a 9.5% merge rate — these serve as exploratory probes, investigation branches, or staging areas that rarely land directly. This is by design but represents ~4% of volume.
Tests/Coverage tasks are highest-confidence: The 13 test-focused PRs achieved a 92.3% merge rate — the highest of any category. Well-scoped testing improvements are highly predictable.
Safe-outputs cluster has the most friction: Cluster 6 (safe-outputs) has the lowest merge rate (68%) and concentrates bug fixes and output constraint issues. This is the most complex cross-cutting subsystem and may benefit from more targeted prompt engineering.
MCP and refactor work is clean and reliable: Clusters 3, 4, and 8 (MCP, CLI, agent infrastructure) have 79–81% merge rates despite high refactor ratios, suggesting well-structured decomposition tasks.
Recommendations
Improve WIP task outcomes: WIP/Investigation PRs rarely convert. Consider a policy of converting WIP findings into scoped follow-up tasks rather than leaving them as dead-end PRs (42 closed PRs = ~4% waste).
Invest in safe-outputs prompt specificity: The safe-outputs cluster consistently underperforms. Tighter pre-agent context (e.g., pre-fetching current safe-output config, providing explicit constraint checklists) could reduce the fix/retry loop in this area.
Standardize conventional commit usage: 27% of PRs use no conventional prefix, making categorization and automation harder. Consistent feat:/fix:/refactor: prefixes would improve routing and reporting accuracy.
Leverage the tests/coverage pattern: Test-focused tasks have the highest merge rate. When introducing new features or refactors, pairing them with an explicit test-coverage sub-task appears to produce the most reliable outcomes.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
Analysis Period: 2026-04-15 to 2026-05-04 (last ~20 days)
Total PRs Analyzed: 1,000
Clusters Identified: 8
Overall Merge Rate: 77.5% (775/1,000)
Workflow Run: §25313820803
The 1,000 most recent Copilot-created PRs were clustered using TF-IDF vectorization on PR titles and bodies, with k=8 selected via silhouette scoring. The repository shows consistent high-velocity agentic work — averaging 50 PRs/day — spanning eight distinct task themes dominated by general GitHub/workflow improvements (33%), documentation/testing tasks (24%), and MCP-related refactors (11%).
### Cluster Analysis — 8 Task Themes
All clusters had healthy merge rates (68–82%). Notable patterns below.
Cluster 1 — GitHub Workflow & Agentics Maintenance (326 PRs, 75% merged)
The largest cluster covers broad workflow-level improvements: runner config, tag pinning, action version updates, and agentic workflow compilation fixes. Mixed bag of bug fixes and maintenance with solid, if slightly lower, merge rate.
Top terms: workflow, github, agentic, request, pull
Top categories: Bug Fix (34%), Other (33%), Add (14%)
Example PRs: #28150, #28533, #29031
Cluster 2 — Tests, Docs, Cache & Copilot Prompts (238 PRs, 82% merged)
Second-largest cluster covers test coverage additions, documentation improvements, cache/memory handling, and Copilot prompt/instruction tuning. High merge rate signals these are well-understood, incremental improvements.
Top terms: test, docs, cache, memory, copilot, perf
Top categories: Bug Fix (43%), Other (33%), Docs (8%)
Example PRs: #28956, #28161, #26507
Cluster 3 — MCP Server Refactors & Gateway Fixes (112 PRs, 79% merged)
MCP-focused work: fixing MCP config formats, refactoring validators, resolving gateway auth issues, and separating GitHub-specific logic. The high refactor proportion (37%) indicates active architectural evolution of the MCP layer.
Top terms: mcp, refactor, gateway, github, shared
Top categories: Refactor (37%), Bug Fix (24%), Other (20%)
Example PRs: #27722, #26585, #27102
Cluster 4 — CLI & MCP Version Management (74 PRs, 81% merged)
CLI consistency improvements, help-text normalization, version bumps, and MCP CLI bridge enhancements. High merge rate and small-to-medium change sets suggest reliable, well-scoped tasks.
Top terms: cli, mcp, copilot, help, claude, bump
Top categories: Bug Fix (35%), Chore/Deps (18%), Other (16%)
Example PRs: #28842, #26715, #26558
Cluster 5 — Feature Development (71 PRs, 76% merged)
Pure feature additions tagged with
feat:— new audit commands, schema extensions, experiment infrastructure, and engine capabilities. Slightly lower merge rate than bug-fix clusters, consistent with higher complexity.Top terms: feat, audit, schema, command, experiments, engine
Top categories: Feature (99%)
Example PRs: #28913, #29783, #26594
Cluster 6 — Safe-Outputs System (65 PRs, 68% merged)
The lowest merge rate cluster — safe-outputs validation, noop guidance, manifest alignment, and pull-request-level output constraints. The lower success rate may reflect the complexity of coordinating output semantics across many workflows.
Top terms: safe, outputs, output, pull, request
Top categories: Bug Fix (29%), Other (23%), Add (17%)
Example PRs: #27479, #29270, #29269
Cluster 7 — Daily Agentic Workflows (60 PRs, 75% merged)
Daily scheduled workflow management: adding new daily checks, rebalancing engine assignments, refactoring shared
daily-*base imports, and recompiling lock files. Steady cadence of infrastructure maintenance.Top terms: daily, workflow, report, optimizer, workflows
Top categories: Other (33%), Bug Fix (23%), Feature (22%)
Example PRs: #28434, #30001, #29787
Cluster 8 — Pre-Agent & Sub-Agent Infrastructure (54 PRs, 81% merged)
Agent lifecycle improvements: pre-agent sanitization, sub-agent orchestration, OTLP span instrumentation, manifest handling, and activation flow fixes. High merge rate indicates tight, targeted changes.
Top terms: agent, pre, feat, steps, sub, engine
Top categories: Other (28%), Bug Fix (26%), Feature (24%)
Example PRs: #29420, #28290, #29668
### Success Rate by Task Category
Categorized by conventional commit prefix / title pattern:
fix:)feat:)### Daily Activity (PR volume per day)
Average: ~50 PRs/day. Peak on 2026-04-16 (84 PRs), with weekday patterns visible (dips on 2026-04-26/27).
### Recent PRs Sample (last 30)
$INSTRUCTIONassertion in TestEngineArgsIntegrationCodexownerfield from experimentsexperiments.NAME == "value"syntax in experiment docsendpointfieldexperimentscommand to read experiment stateKey Findings
High throughput, strong success rate: 1,000 PRs in ~20 days at 77.5% overall merge rate, with most clusters achieving 75–82%. The agentic workflow system is operating at scale with high quality.
WIP/Investigation tasks are outliers: 42 PRs tagged
[WIP]had only a 9.5% merge rate — these serve as exploratory probes, investigation branches, or staging areas that rarely land directly. This is by design but represents ~4% of volume.Tests/Coverage tasks are highest-confidence: The 13 test-focused PRs achieved a 92.3% merge rate — the highest of any category. Well-scoped testing improvements are highly predictable.
Safe-outputs cluster has the most friction: Cluster 6 (safe-outputs) has the lowest merge rate (68%) and concentrates bug fixes and output constraint issues. This is the most complex cross-cutting subsystem and may benefit from more targeted prompt engineering.
MCP and refactor work is clean and reliable: Clusters 3, 4, and 8 (MCP, CLI, agent infrastructure) have 79–81% merge rates despite high refactor ratios, suggesting well-structured decomposition tasks.
Recommendations
Improve WIP task outcomes: WIP/Investigation PRs rarely convert. Consider a policy of converting WIP findings into scoped follow-up tasks rather than leaving them as dead-end PRs (42 closed PRs = ~4% waste).
Invest in safe-outputs prompt specificity: The safe-outputs cluster consistently underperforms. Tighter pre-agent context (e.g., pre-fetching current safe-output config, providing explicit constraint checklists) could reduce the fix/retry loop in this area.
Standardize conventional commit usage: 27% of PRs use no conventional prefix, making categorization and automation harder. Consistent
feat:/fix:/refactor:prefixes would improve routing and reporting accuracy.Leverage the tests/coverage pattern: Test-focused tasks have the highest merge rate. When introducing new features or refactors, pairing them with an explicit test-coverage sub-task appears to produce the most reliable outcomes.
References:
Beta Was this translation helpful? Give feedback.
All reactions