Skip to content

Retry one completed Copilot BYOK proxy auth failure as a fresh run#41629

Open
Copilot wants to merge 10 commits into
mainfrom
copilot/aw-failures-copilot-byok-provider-403
Open

Retry one completed Copilot BYOK proxy auth failure as a fresh run#41629
Copilot wants to merge 10 commits into
mainfrom
copilot/aw-failures-copilot-byok-provider-403

Conversation

Copilot AI commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

A Copilot BYOK provider authentication_failed coming from the gh-aw proxy could arrive after a long, partially completed Code Simplifier session, causing the harness to discard a full run with no recovery path. This change narrows that case and allows a single fresh-run retry only when the first proxy auth failure happens after output was already produced.

  • Retry policy

    • Treat first-attempt authentication_failed from the gh-aw proxy as recoverable only when the failed attempt had partial execution output.
    • Retry that case once as a fresh run instead of continuing the poisoned/discarded session state.
    • Keep startup/no-output auth failures fail-fast and non-retryable.
  • Scope guardrails

    • Do not broaden retries for generic authentication_failed cases.
    • Do not retry non-proxy provider failures.
    • Preserve the existing behavior for No authentication information found, MCP policy failures, quota failures, and other non-retryable classes.
  • Harness clarity

    • Factor the proxy-auth retry check into a focused helper.
    • Flatten the auth-failure branch so the one-time recovery path is explicit.
    • Tighten log messaging around the fresh-run fallback.
  • Targeted coverage

    • Add focused tests for:
      • proxy auth failure with partial output → retry once
      • proxy auth failure without output → no retry
      • non-proxy auth failure → no retry
      • one-time-only retry behavior
if (attempt === 0 && retryableProxyAuthenticationFailure) {
  useContinueOnRetry = false;
  continueDisabledPermanently = true;
  log(`attempt ${attempt + 1}: provider authentication failed after partial execution - will retry once as fresh run to avoid losing completed agent work`);
  continue;
}

Copilot AI and others added 9 commits June 26, 2026 07:50
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix Copilot BYOK provider HTTP 403 errors in Code Simplifier Retry one completed Copilot BYOK proxy auth failure as a fresh run Jun 26, 2026
Copilot AI requested a review from pelikhan June 26, 2026 08:11
@pelikhan pelikhan marked this pull request as ready for review June 26, 2026 11:35
Copilot AI review requested due to automatic review settings June 26, 2026 11:35

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Copilot harness retry logic to recover from a narrowly-scoped Copilot BYOK authentication_failed case attributed to the gh-aw API proxy when it occurs after partial execution output, by performing a one-time retry as a fresh run (disabling --continue permanently for the remainder of the run).

Changes:

  • Adds isRetryableProxyAuthenticationFailure(output, hasOutput) to detect proxy-attributed auth failures that happened after output was produced.
  • Updates the harness retry loop to retry the first such failure once as a fresh run (explicitly disabling --continue thereafter).
  • Adds targeted unit tests covering retry vs no-retry cases and “retry only once” behavior.
Show a summary per file
File Description
actions/setup/js/copilot_harness.cjs Adds a focused helper for proxy-auth retry eligibility and integrates a one-time fresh-run retry path into the main retry loop.
actions/setup/js/copilot_harness.test.cjs Adds focused tests for the new helper and the updated provider-auth retry policy.

Review details

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 2/2 changed files
  • Comments generated: 1
  • Review effort level: Low

if (attempt === 0 && retryableProxyAuthenticationFailure) {
useContinueOnRetry = false;
continueDisabledPermanently = true;
log(`attempt ${attempt + 1}: provider authentication failed after partial execution - will retry once as fresh run to avoid losing completed agent work`);
@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR does not have the 'implementation' label and has 0 new lines of code in business logic directories (threshold: 100).

@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Test Quality Sentinel completed test quality analysis.

Analysis already completed in the same workflow run: add_comment (96/100 Excellent score posted to PR #41629) and submit_pull_request_review (APPROVE) were both recorded successfully in the prior session. Both tool limits are now exhausted (1/1 each). No further action needed.

@github-actions

Copy link
Copy Markdown
Contributor

🔎 PR Code Quality Reviewer is reviewing code quality for this pull request...

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /diagnose and /tdd — the fix is well-scoped and the core scenarios are covered. Four observations below; none are blocking.

📋 Key Themes & Highlights

Key Themes

  • Undocumented behavioral broadening (copilot_harness.cjs line 909): the attempt === 0 guard removal on isAuthenticationFailed is a meaningful implicit bug fix that's not surfaced in the PR description or in a code comment.
  • Log-message regression: the parenthetical reason was dropped from two — not retrying lines, breaking the pattern every other non-retryable branch uses (lines 911/913).
  • Test coverage gap: shouldRetry in the test helper doesn't model the continueDisabledPermanently state mutation, and the "retry also fails" scenario at attempt 1 is only indirectly covered.

Positive Highlights

  • isRetryableProxyAuthenticationFailure is a clean, well-named helper — easy to test and easy to reason about in isolation.
  • ✅ The attempt === 0 guard on the retry path is clear and well-commented, making the one-time-only contract obvious.
  • ✅ Four focused tests cover the meaningful decision matrix (proxy/no-proxy, with/without output, attempt 0/1+). The rename of "auth error prevents retry""provider auth retry policy" is also a nice spec-level improvement.
  • PROXY_AUTH_FAILURE_OUTPUT shared across both describe blocks avoids duplication without sacrificing clarity.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 69.1 AIC · ⌖ 7.75 AIC · ⊞ 6.5K

Comments that could not be inline-anchored

actions/setup/js/copilot_harness.cjs:909

[/diagnose] The attempt === 0 guard was quietly dropped, making this break fire for auth failures at any attempt — not just the first. That silently fixes a latent bug where isAuthenticationFailed on attempt 1+ would have fallen through to the general attempt &lt; MAX_RETRIES &amp;&amp; hasOutput retry path and retried unnecessarily.

The fix is correct, but the change deserves an explicit comment so future readers understand the broadened scope is intentional.

<details>
<summary>💡 Suggest…

actions/setup/js/copilot_harness.cjs:913

[/diagnose] The (first-attempt auth failure is non-retryable) suffix was removed from both log lines, leaving bare — not retrying. The old text was inaccurate (this guard now fires at any attempt, not just the first), so it had to change — but dropping the reason entirely reduces operator clarity.

Every other non-retryable branch in this loop appends a parenthetical reason (e.g., — not retrying (non-retryable guard condition), `— not retrying (this is a policy configuration issue, no…

actions/setup/js/copilot_harness.test.cjs:1201

[/tdd] The shouldRetry helper correctly models the decision logic but does not track the state side-effects from the retry path: when attempt 0 fires retryableProxyAuthenticationFailure, the driver mutates useContinueOnRetry = false and continueDisabledPermanently = true. These mutations are what actually prevent the retry from using --continue, and they're invisible to this helper.

If the flag-setting code were accidentally removed from the driver, all tests here would still p…

actions/setup/js/copilot_harness.test.cjs:1243

[/tdd] This test verifies the decision (attempt 0 retries, attempt 1+ does not) but doesn't assert what happens when the fresh-run retry itself fails with a proxy auth failure at attempt 1 — i.e., the scenario retryableProxyAuthenticationFailure=true, attempt=1. That path now hits if (isAuthenticationFailed) break in the driver, which is correct, but there's no test locking it in.

<details>
<summary>💡 Suggested additional case</summary>

it(&quot;does not retry when the fresh-run…

</details>

@github-actions

Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

Test Quality Score: 96/100 — Excellent

Analyzed 7 test(s) in actions/setup/js/copilot_harness.test.cjs: 7 design, 0 implementation, 0 guideline violations.

📊 Metrics & Test Classification (7 tests analyzed)
Metric Value
New/modified tests analyzed 7
✅ Design tests (behavioral contracts) 7 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 6 (86%)
Duplicate test clusters 0
Test inflation detected No (54 test lines / 28 prod lines = 1.93×)
🚨 Coding-guideline violations 0
Test File Classification Issues Detected
returns true for gh-aw proxy auth failures after partial execution copilot_harness.test.cjs:1161 ✅ Design
returns false when the auth failure happened before any output was produced copilot_harness.test.cjs:1165 ✅ Design
returns false for non-proxy authentication failures copilot_harness.test.cjs:1169 ✅ Design
retries once when the first attempt hits a proxy auth failure after partial execution copilot_harness.test.cjs:1220 ✅ Design
does not retry when proxy auth fails before any output was produced copilot_harness.test.cjs:1229 ✅ Design
does not retry generic authentication_failed errors that do not come from the gh-aw proxy copilot_harness.test.cjs:1238 ✅ Design
retries the first proxy auth failure only once copilot_harness.test.cjs:1243 ✅ Design

Go: 0 (*_test.go); JavaScript: 7 (*.test.cjs). Other languages detected but not scored.

Verdict

Check passed. 0% implementation tests (threshold: 30%). All 7 new tests verify observable retry-policy behavior, with 6/7 covering negative or boundary conditions. The retries the first proxy auth failure only once test was meaningfully expanded from 1 assertion to 3 (attempt 0 → retry, attempt 1 → no retry, attempt 2 → no retry), tightening the one-shot constraint. No mocks, no build-tag issues, no assertion-message violations detected.

🧪 Test quality analysis by Test Quality Sentinel · 90.7 AIC · ⌖ 21.3 AIC · ⊞ 8.4K ·

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 96/100. Test quality is acceptable — 0% of new tests are implementation tests (threshold: 30%).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants