fix(dispatch): hot-reload race + per-provider retry + RateLimitHandler trait#323
Open
Destynova2 wants to merge 1 commit intomainfrom
Open
fix(dispatch): hot-reload race + per-provider retry + RateLimitHandler trait#323Destynova2 wants to merge 1 commit intomainfrom
Destynova2 wants to merge 1 commit intomainfrom
Conversation
…r trait
Three related fixes around dispatch retry semantics and the hot-reload race:
1. Block hot-reload until validation completes. Both `/api/config/reload`
(HTTP) and the `grob/server/reload_config` JSON-RPC endpoint awaited
`validate_config()` *after* the atomic swap, so an invalid config could
serve traffic for several seconds. They now validate against the
candidate provider registry before swapping; failure returns 422 with
a list of broken router models and leaves the live snapshot intact, so
in-flight requests continue on the old config.
2. Per-provider `max_retries`. Add `max_retries: Option<u32>` to
`ProviderConfig` and a `provider_max_retries()` resolver that reads
the per-provider value or falls back to the global `MAX_RETRIES = 2`.
The dispatch retry loop in `src/server/dispatch/retry.rs` now consumes
this resolved budget so Anthropic can stay at 2 while OpenAI and
OpenRouter / DeepSeek can opt into 3.
3. `RateLimitHandler` trait. Centralise the 429/529/Anthropic-401 logic
that was duplicated across three sites in `retry.rs`. The trait is
implemented for `ProviderError`, exposes `is_rate_limit()` and a
future-facing `retry_after_ms()` hook, and replaces the inline
`matches!(e, ProviderError::ApiError { status: 429, .. })` checks.
Tests cover per-provider retry resolution (Anthropic = 2, OpenAI = 3,
OpenRouter = 3, missing → default, explicit 0), the rate-limit handler
across upstream variants, and the validation gate (empty / all-ok / any-ok
passes; broken-model detail surfacing). Full nextest workspace run is
green (1289 tests).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three related fixes around dispatch retry semantics and the hot-reload race, bundled into a single PR per the brief.
/api/config/reloadHTTP handler and thegrob/server/reload_configJSON-RPC method previously spawnedvalidate_config()as a background task after the atomic swap. An invalid config could therefore serve traffic for several seconds before the validation log line surfaced. Both endpoints nowawaitvalidation against the candidate provider registry before swapping; on failure they return a 4xx (HTTP422 Unprocessable Entity/ JSON-RPCERR_INTERNALwith detail) and leave the liveinnersnapshot untouched, so in-flight requests continue on the old config.max_retries— addedmax_retries: Option<u32>toProviderConfigand aprovider_max_retries()resolver that reads the per-provider value or falls back to the globalMAX_RETRIES = 2. The dispatch retry loop insrc/server/dispatch/retry.rsnow consumes this resolved budget so Anthropic can stay at 2 while OpenAI / OpenRouter / DeepSeek opt into 3 (declarative — no provider names hard-coded in the dispatch path).RateLimitHandlertrait — newsrc/server/dispatch/rate_limit.rsmodule centralises 429/529/Anthropic-401 detection. The trait is implemented forProviderError, exposesis_rate_limit()and a future-facingretry_after_ms()hook (currently returnsNonebecauseProviderErrordoes not retainRetry-Afterheaders — the hook is the explicit extension point for the post unified-error refactor). The three inlinematches!(e, ProviderError::ApiError { status: 429, .. })checks inretry.rsare replaced with singleerr.is_rate_limit()calls.Test plan
cargo check --tests --workspaceclean.cargo clippy --tests --workspace -- -D warningsclean.cargo fmt --all -- --checkclean.cargo nextest run --workspace— 1289 / 1289 passing locally.server::budget::tests::resolve_max_retries_*— Anthropic = 2, OpenAI = 3, OpenRouter = 3, missing → default, explicit `0`, multi-provider isolation.server::dispatch::rate_limit::tests::*— Anthropic / OpenAI / DeepSeek 429, Anthropic 529 overload, Anthropic 401-with-`rate_limit_error` payload, auth-401 / 5xx / non-API-error negatives.server::config_api::tests::*— empty / all-ok / any-ok validation passes; broken router model surfaces a rejection with detail and a `broken_models` JSON array.Notes for reviewers
fix/preset-mod-include-strper the brief. That branch has since been merged and deleted on the remote, so this PR targetsmaindirectly — the diff is identical becausefix/preset-mod-include-strwas already at the tip ofmain(ee43b24).src/server/dispatch/retry.rs. Rebasing on top of that PR is expected. If the unified error type exposes a richerRateLimitHandler::retry_after_ms()source (header-aware), pick that PR's API at merge time and keep the trait + per-provider budget plumbing from this PR.grob/server/reload_confighandler insrc/server/rpc/server_ns.rsso both reload surfaces share the validate-before-swap contract.🤖 Generated with Claude Code