Skip to content

feat(runtime-fallback): auto-retry, generic retry detection, and timeout toggle#1777

Open
youngbinkim0 wants to merge 65 commits intocode-yeongyu:devfrom
youngbinkim0:feat/runtime-fallback-only
Open

feat(runtime-fallback): auto-retry, generic retry detection, and timeout toggle#1777
youngbinkim0 wants to merge 65 commits intocode-yeongyu:devfrom
youngbinkim0:feat/runtime-fallback-only

Conversation

@youngbinkim0
Copy link
Contributor

@youngbinkim0 youngbinkim0 commented Feb 11, 2026

Summary

This PR implements runtime fallback auto-retry with full agent preservation. The fix chain enables end-to-end runtime fallback to work correctly in production.

What Changed

Bug Fixes (9 total)

  1. Nested status code detection - Anthropic errors have data.statusCode structure
  2. Regex pattern relaxed - Match "credit balance too low" with flexible spacing
  3. Schema limit bumped - max_fallback_attempts now supports 20 (was 10)
  4. Fallback models discovery - Handle sparse session.error events without agent info
  5. Model fallback init - Derive model from agent config when events lack data
  6. Auto-retry with promptAsync - Resend last user message after fallback selection
  7. Persistent fallback - Fallback model now persists across messages in session
  8. Manual model override - Detect user-initiated model changes and reset fallback state
  9. Agent preservation - Resolve correct agent for auto-retry (was defaulting to sisyphus)

New Features

  • Provider-agnostic retry detection - Generalized pattern for Copilot, OpenAI, and future providers
  • Timeout toggle - timeout_seconds: 0 disables quota-based fallback escalation
  • Agent/category fallback_models - Support custom fallback chains per agent/category
  • Provider-aware model resolution - Check connectivity before matching fallback models

Testing

✅ 44 runtime-fallback tests passing
✅ 46 schema tests passing
✅ TypeScript compilation clean
✅ Manual end-to-end verification (Anthropic error → auto-fallback → agent preserved → manual override works)

Related

Builds on #1408 (feat/runtime-fallback-only)

Rebase Bot and others added 20 commits February 4, 2026 19:41
Add configuration schemas for runtime model fallback feature:
- RuntimeFallbackConfigSchema with enabled, retry_on_errors,
  max_fallback_attempts, cooldown_seconds, notify_on_fallback
- FallbackModelsSchema for init-time fallback model selection
- Add fallback_models to AgentOverrideConfigSchema and CategoryConfigSchema
- Export types and schemas from config/index.ts

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
- Add Category-level fallback_models support in getFallbackModelsForSession()
  - Try agent-level fallback_models first
  - Then try agent's category fallback_models
  - Support all builtin agents including hephaestus, sisyphus-junior, build, plan

- Expand agent name recognition regex to include:
  - hephaestus, sisyphus-junior, build, plan, multimodal-looker

- Add comprehensive test coverage (6 new tests, total 24):
  - Model switching via chat.message hook
  - Agent-level fallback_models configuration
  - SessionID agent pattern detection
  - Cooldown mechanism validation
  - Max attempts limit enforcement

All 24 tests passing

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Implement full fallback_models support across all integration points:

1. Model Resolution Pipeline (src/shared/model-resolution-pipeline.ts)
   - Add userFallbackModels to ModelResolutionRequest
   - Process user fallback_models before hardcoded fallback chain
   - Support both connected provider and availability checking modes

2. Agent Utils (src/agents/utils.ts)
   - Update applyModelResolution to accept userFallbackModels
   - Inject fallback_models for all builtin agents (sisyphus, oracle, etc.)
   - Support both single string and array formats

3. Model Resolver (src/shared/model-resolver.ts)
   - Add userFallbackModels to ExtendedModelResolutionInput type
   - Pass through to resolveModelPipeline

4. Delegate Task Executor (src/tools/delegate-task/executor.ts)
   - Extract category fallback_models configuration
   - Pass to model resolution pipeline
   - Register session category for runtime-fallback hook

5. Session Category Registry (src/shared/session-category-registry.ts)
   - New module: maps sessionID -> category
   - Used by runtime-fallback to lookup category fallback_models
   - Auto-cleanup support

6. Runtime Fallback Hook (src/hooks/runtime-fallback/index.ts)
   - Check SessionCategoryRegistry first for category fallback_models
   - Fallback to agent-level configuration
   - Import and use SessionCategoryRegistry

Test Results:
- runtime-fallback: 24/24 tests passing
- model-resolver: 46/46 tests passing

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
…ching

Replace word-boundary regex with stricter patterns that match

status codes only at start/end of string or surrounded by whitespace.

Prevents false matches like '1429' or '4290'.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Add shared utility to normalize fallback_models config values.

Handles both single string and array inputs consistently.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Replace 5 instances of inline fallback_models normalization with

the shared normalizeFallbackModels() utility function.

Eliminates code duplication and ensures consistent behavior.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Resolved conflicts in:

- src/config/schema.ts (kept both hooks)

- src/hooks/index.ts (exported both hooks)

- src/index.ts (imported both hooks)

- src/shared/index.ts (exported both utilities)
Resolved conflicts in:
- src/config/schema.ts (HookNameSchema + OhMyOpenCodeConfigSchema)
- src/agents/utils.ts (imports + model resolution calls)
- docs/configurations.md (category options table + runtime fallback docs)
- src/hooks/AGENTS.md (hook list)
- src/tools/delegate-task/executor.ts (imports + session category registry)
- src/tools/delegate-task/tools.test.ts (test case updates)
- src/features/background-agent/manager.ts (cleanup + SessionCategoryRegistry)
- Fix bun.lock version conflicts (3.3.1 -> 3.3.2)
- Remove Git conflict markers from docs/configurations.md
- Remove duplicate normalizeFallbackModels, import from shared module
Implements runtime model fallback that automatically switches to backup models
when the primary model encounters transient errors (rate limits, overload, etc.).

Features:
- runtime_fallback configuration with customizable error codes, cooldown, notifications
- Runtime fallback hook intercepts API errors (429, 503, 529)
- Support for fallback_models from agent/category configuration
- Session-state TTL and periodic cleanup to prevent memory leaks
- Robust agent name detection with explicit AGENT_NAMES array
- Session category registry for category-specific fallback lookup

Schema changes:
- Add RuntimeFallbackConfigSchema with enabled, retry_on_errors, max_fallback_attempts,
  cooldown_seconds, notify_on_fallback options
- Add fallback_models to AgentOverrideConfigSchema and CategoryConfigSchema
- Add runtime-fallback to HookNameSchema

Files added:
- src/hooks/runtime-fallback/index.ts - Main hook implementation
- src/hooks/runtime-fallback/types.ts - Type definitions
- src/hooks/runtime-fallback/constants.ts - Constants and defaults
- src/hooks/runtime-fallback/index.test.ts - Comprehensive tests
- src/config/schema/runtime-fallback.ts - Schema definition
- src/shared/session-category-registry.ts - Session category tracking

Files modified:
- src/hooks/index.ts - Export runtime-fallback hook
- src/plugin/hooks/create-session-hooks.ts - Register runtime-fallback hook
- src/config/schema.ts - Export runtime-fallback schema
- src/config/schema/oh-my-opencode-config.ts - Add runtime_fallback config
- src/config/schema/agent-overrides.ts - Add fallback_models to agent config
- src/config/schema/categories.ts - Add fallback_models to category config
- src/config/schema/hooks.ts - Add runtime-fallback to hook names
- src/shared/index.ts - Export session-category-registry
- docs/configurations.md - Add Runtime Fallback documentation
- docs/features.md - Add runtime-fallback to hooks list

Supersedes code-yeongyu#1237, code-yeongyu#1408
Closes code-yeongyu#1408
- Add normalizeFallbackModels helper to centralize string/array normalization (P3)
- Export RuntimeFallbackConfig and FallbackModels types from config/index.ts
- Fix agent detection regex to use word boundaries for sessionID matching
- Improve tests to verify actual fallback switching logic (not just log paths)
- Add SessionCategoryRegistry cleanup in executeSyncTask on completion/error (P2)
- All 24 runtime-fallback tests pass, 115 delegate-task tests pass
…-my-opencode into feat/runtime-fallback-only
…gent detection

The \b word boundary regex treats '-' as a boundary, causing
'sisyphus-junior-session-123' to incorrectly match 'sisyphus'
instead of 'sisyphus-junior'.

Sorting agent names by length (descending) ensures longer names
are matched first, fixing the hyphenated agent detection issue.

Fixes cubic-dev-ai review issue code-yeongyu#8
…servation, and model override

Bug fixes:
1. extractStatusCode: handle nested data.statusCode (Anthropic error structure)
2. Error regex: relax credit.*balance.*too.*low pattern for multi-char gaps
3. Zod schema: bump max_fallback_attempts from 10 to 20 (config rejected silently)
4. getFallbackModelsForSession: fallback to sisyphus/any agent when session.error lacks agent
5. Model detection: derive model from agent config when session.error lacks model info
6. Auto-retry: resend last user message with fallback model via promptAsync
7. Persistent fallback: override model on every chat.message (not just pendingFallbackModel)
8. Manual model change: detect UI model changes and reset fallback state
9. Agent preservation: include agent in promptAsync body to prevent defaulting to sisyphus

Additional:
- Add sessionRetryInFlight guard to prevent double-retries
- Add resolveAgentForSession with 3-tier resolution (event → session memory → session ID)
- Add normalizeAgentName for display names like "Prometheus (Planner)" → "prometheus"
- Add resolveAgentForSessionFromContext to fetch agent from session messages
- Move AGENT_NAMES and agentPattern to module scope for reuse
- Register runtime-fallback hooks in event.ts and chat-message.ts
- Remove diagnostic debug logging from isRetryableError
- Add 400 to default retry_on_errors and credit/balance patterns to RETRYABLE_ERROR_PATTERNS
@github-actions
Copy link
Contributor

github-actions bot commented Feb 11, 2026

All contributors have signed the CLA. Thank you! ✅
Posted by the CLA Assistant Lite bot.

@youngbinkim0
Copy link
Contributor Author

I have read the CLA Document and I hereby sign the CLA

@youngbinkim0
Copy link
Contributor Author

recheck

1 similar comment
@youngbinkim0
Copy link
Contributor Author

recheck

github-actions bot added a commit that referenced this pull request Feb 11, 2026
Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 27 files

Confidence score: 3/5

  • Potential user-impacting inconsistency in src/shared/model-resolution-pipeline.ts: new userFallbackModels reads connected providers only from cache, ignoring constraints.connectedProviders, which could change model resolution behavior.
  • Code duplication in src/hooks/runtime-fallback/index.ts auto-retry handlers raises maintenance risk and could lead to subtle divergence, though it’s not an immediate blocker.
  • Pay close attention to src/shared/model-resolution-pipeline.ts and src/hooks/runtime-fallback/index.ts - connected provider handling and duplicated retry logic.
Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/hooks/runtime-fallback/index.ts">

<violation number="1" location="src/hooks/runtime-fallback/index.ts:473">
P2: Significant code duplication in auto-retry logic between `session.error` and `message.updated` event handlers. The logic for fetching session messages, extracting the last user message, parsing parts, and calling `promptAsync` is duplicated almost verbatim (approximately 40+ lines). This increases maintenance burden and risk of inconsistencies.</violation>
</file>

<file name="src/shared/model-resolution-pipeline.ts">

<violation number="1" location="src/shared/model-resolution-pipeline.ts:105">
P1: Inconsistent handling of `constraints.connectedProviders` - the new `userFallbackModels` logic reads connected providers only from cache, ignoring the `constraints.connectedProviders` parameter that is respected in `categoryDefaultModel` and `fallbackChain` logic. This prevents callers from enforcing specific provider constraints in the user fallback code path.</violation>
</file>

Since this is your first cubic review, here's how it works:

  • cubic automatically reviews your code and comments on bugs and improvements
  • Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
  • Add one-off context when rerunning by tagging @cubic-dev-ai with guidance or docs links (including llms.txt)
  • Ask questions if you need clarification on any suggestion

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

…r constraint inconsistency

- Extract duplicated auto-retry logic (~40 lines each) from session.error and
  message.updated handlers into shared autoRetryWithFallback() helper
- Fix userFallbackModels path in model-resolution-pipeline to respect
  constraints.connectedProviders parameter instead of reading cache directly,
  matching the behavior of categoryDefaultModel and fallbackChain paths
@youngbinkim0
Copy link
Contributor Author

youngbinkim0 commented Feb 20, 2026

One situation that can get a little messy and i'm not sure it is working perfectly is when using prometheus to create a plan. At the end it'll give you the start work slash command to run. When you run the slash command, if the model for atlas is rate limited, you'll get the message about falling back to the next model. However, it won't switch to atlas and instead it will change the model and insert a message of continue but Prometheus will recieve the message instead and prometheus will say, you need to run the slash command to start work.

When you run a slash command that targets an agent and the model used is rate limited, we need to switch to the agent, then rotate to the next model, then enter the message of continue.

Issue 2 i noticed is there is odd behavior around when the compact command runs. It won't actually generate the compacted contents if the model that is going to be used for compacting is rate limited. It just kind of hanges there with no contents and you have to undo and recover the session state by running compact manually then continuing.

@rothnic

  1. Great catch — I've seen this too. When /start-work fires and atlas's model is rate-limited, the fallback rotates the model but doesn't actually switch the agent, so prometheus ends up receiving the "continue" message and loops back to "run the slash command."
    This is a real issue but I think it deserves its own PR. The current runtime-fallback work (this PR) fixes agent resolution after a session is established — the scenario you're describing is about the agent switch + model selection failing atomically during session creation, which touches different parts of the flow (/start-work orchestration, session creation timing). Mixing the two would make both harder to review and test.
    Would you be open to filing a separate issue for this? Happy to collaborate on it once this PR lands.

  2. Separately — the compaction hanging when the compaction model is rate-limited is also worth tracking, but that lives in a completely different subsystem (anthropic-context-window-limit-recovery / preemptive-compaction), not runtime-fallback. Probably best as its own issue too so it gets the right eyes on it.

@youngbinkim0 youngbinkim0 marked this pull request as ready for review February 20, 2026 00:24
Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 51 files

Confidence score: 4/5

  • Schema duplication in assets/oh-my-opencode.schema.json repeats fallback_models 15 times instead of using $defs/$ref, which raises maintenance risk and increases chance of future inconsistency if edits are missed
  • Overall risk appears low because this is a schema maintainability concern rather than a direct runtime regression, so this PR still seems safe to merge
  • Pay close attention to assets/oh-my-opencode.schema.json - consider consolidating fallback_models into $defs to reduce duplication
Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="assets/oh-my-opencode.schema.json">

<violation number="1" location="assets/oh-my-opencode.schema.json:304">
P2: Significant schema duplication: `fallback_models` definition is repeated 15 times across agent configurations instead of using JSON Schema `$defs` with `$ref` references</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

youngbinkim0 and others added 6 commits February 19, 2026 20:50
- Extract repeated AgentOverrideConfig to definitions block
- Replace 14 inline copies with $ref references
- Reduce schema file from 3563 to 1033 lines (-70%)
The test 'should NOT interrupt running session with no progress' had its
assertions incorrectly changed during merge conflict resolution. Restored
upstream/dev assertions: running sessions are never stale-killed, matching
the implementation's sessionIsRunning guard at line 1741.
@rothnic
Copy link

rothnic commented Feb 21, 2026

  1. Great catch — I've seen this too. When /start-work fires and atlas's model is rate-limited, the...

@youngbinkim0 one other case i encountered i wanted to mention to make sure it would also go into a separate issue is for the todo continuation and generally how the fallback works.

Let's say you have an agent working on something and it doesn't hit rate limits. However, let's assume you just used up your last request, so the next request will hit the rate limits and the agent stops working with an incomplete todo list. The model that is chosen is the one that has reached the usage limits, so it has to fallback. Sometimes it seems like at this point while waiting for the fallback OmO will try to do the ToDo continuation again and it seems to reconsider the fallback all over again, so it kind of gets in a loop of not allowing the fallback to occur and then the todo continuation kicks in each time.

In other words, there is a couple issues there potentially:

  • The selected model isn't updated with the model we are trying to fallback to
  • The todo continuation functionality seems almost unaware of a fallback being in progress

@davidakerr
Copy link

davidakerr commented Feb 22, 2026

I seen some interrupted messaging on the fallbacks.
image

image

@rothnic
Copy link

rothnic commented Feb 22, 2026

@davidakerr is that using the opencode desktop application? I just tried it out for the first time earlier and also ended up with an issue where a session was interrupted in a similar situation.

@davidakerr
Copy link

davidakerr commented Feb 22, 2026

For these tests I'm running opencode serve's WebUI

@youngbinkim0
Copy link
Contributor Author

@davidakerr @rothnic Thanks for the reports and screenshots! We've filed three follow-up issues to track the problems you've been hitting:

Your screenshots show MiniMax M2.5 Free · Interrupted — this looks like a mid-stream interruption (the model starts responding then gets cut off), which is a slightly different path from the pre-request rate-limit errors (429s) that this PR's patterns handle. The patterns added here (/quota.?protection/i, /key.?limit.?exceeded/i) catch errors returned before any content streams. Mid-stream disconnects from free-tier models are an additional edge case that #2063 in particular would help with — smoother auto-recovery instead of dropping to "Interrupted" and requiring a manual "continue" click.

We'll keep this PR focused on the error pattern additions and address the interruption/recovery flow in those follow-up issues.

Fixes two related agent resolution bugs:

1. General UI agent switch: Use updateSessionAgent instead of write-once
   setSessionAgent. When switching agents on an existing session, the agent
   map was silently not updating (setSessionAgent has if (!has(sessionID))
   guard). This caused fallback to use the wrong agent's models.

2. /start-work race: Detect <session-context> marker and pin atlas agent
   immediately in chat-message.ts, before any hooks or model requests fire.
   Previously the pin happened in start-work-hook.ts which could race with
   model errors, causing fallback to resolve to prometheus instead of atlas.

Closes code-yeongyu#2060
Related: code-yeongyu#2062, code-yeongyu#2063
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants