Skip to content

fix(dispatch): hot-reload race + per-provider retry + RateLimitHandler trait#323

Open
Destynova2 wants to merge 1 commit intomainfrom
fix/dispatch-retry-and-reload
Open

fix(dispatch): hot-reload race + per-provider retry + RateLimitHandler trait#323
Destynova2 wants to merge 1 commit intomainfrom
fix/dispatch-retry-and-reload

Conversation

@Destynova2
Copy link
Copy Markdown
Contributor

Summary

Three related fixes around dispatch retry semantics and the hot-reload race, bundled into a single PR per the brief.

  • Hot-reload race fix — the /api/config/reload HTTP handler and the grob/server/reload_config JSON-RPC method previously spawned validate_config() as a background task after the atomic swap. An invalid config could therefore serve traffic for several seconds before the validation log line surfaced. Both endpoints now await validation against the candidate provider registry before swapping; on failure they return a 4xx (HTTP 422 Unprocessable Entity / JSON-RPC ERR_INTERNAL with detail) and leave the live inner snapshot untouched, so in-flight requests continue on the old config.
  • Per-provider max_retries — added max_retries: Option<u32> to ProviderConfig and a provider_max_retries() resolver that reads the per-provider value or falls back to the global MAX_RETRIES = 2. The dispatch retry loop in src/server/dispatch/retry.rs now consumes this resolved budget so Anthropic can stay at 2 while OpenAI / OpenRouter / DeepSeek opt into 3 (declarative — no provider names hard-coded in the dispatch path).
  • RateLimitHandler trait — new src/server/dispatch/rate_limit.rs module centralises 429/529/Anthropic-401 detection. The trait is implemented for ProviderError, exposes is_rate_limit() and a future-facing retry_after_ms() hook (currently returns None because ProviderError does not retain Retry-After headers — the hook is the explicit extension point for the post unified-error refactor). The three inline matches!(e, ProviderError::ApiError { status: 429, .. }) checks in retry.rs are replaced with single err.is_rate_limit() calls.

Test plan

  • cargo check --tests --workspace clean.
  • cargo clippy --tests --workspace -- -D warnings clean.
  • cargo fmt --all -- --check clean.
  • cargo nextest run --workspace1289 / 1289 passing locally.
  • New unit tests:
    • server::budget::tests::resolve_max_retries_* — Anthropic = 2, OpenAI = 3, OpenRouter = 3, missing → default, explicit `0`, multi-provider isolation.
    • server::dispatch::rate_limit::tests::* — Anthropic / OpenAI / DeepSeek 429, Anthropic 529 overload, Anthropic 401-with-`rate_limit_error` payload, auth-401 / 5xx / non-API-error negatives.
    • server::config_api::tests::* — empty / all-ok / any-ok validation passes; broken router model surfaces a rejection with detail and a `broken_models` JSON array.

Notes for reviewers

  • Originally targeted fix/preset-mod-include-str per the brief. That branch has since been merged and deleted on the remote, so this PR targets main directly — the diff is identical because fix/preset-mod-include-str was already at the tip of main (ee43b24).
  • Likely conflict with the parallel "unified error" PR which also touches src/server/dispatch/retry.rs. Rebasing on top of that PR is expected. If the unified error type exposes a richer RateLimitHandler::retry_after_ms() source (header-aware), pick that PR's API at merge time and keep the trait + per-provider budget plumbing from this PR.
  • The same race fix has been applied symmetrically to the JSON-RPC grob/server/reload_config handler in src/server/rpc/server_ns.rs so both reload surfaces share the validate-before-swap contract.

🤖 Generated with Claude Code

…r trait

Three related fixes around dispatch retry semantics and the hot-reload race:

1. Block hot-reload until validation completes. Both `/api/config/reload`
   (HTTP) and the `grob/server/reload_config` JSON-RPC endpoint awaited
   `validate_config()` *after* the atomic swap, so an invalid config could
   serve traffic for several seconds. They now validate against the
   candidate provider registry before swapping; failure returns 422 with
   a list of broken router models and leaves the live snapshot intact, so
   in-flight requests continue on the old config.

2. Per-provider `max_retries`. Add `max_retries: Option<u32>` to
   `ProviderConfig` and a `provider_max_retries()` resolver that reads
   the per-provider value or falls back to the global `MAX_RETRIES = 2`.
   The dispatch retry loop in `src/server/dispatch/retry.rs` now consumes
   this resolved budget so Anthropic can stay at 2 while OpenAI and
   OpenRouter / DeepSeek can opt into 3.

3. `RateLimitHandler` trait. Centralise the 429/529/Anthropic-401 logic
   that was duplicated across three sites in `retry.rs`. The trait is
   implemented for `ProviderError`, exposes `is_rate_limit()` and a
   future-facing `retry_after_ms()` hook, and replaces the inline
   `matches!(e, ProviderError::ApiError { status: 429, .. })` checks.

Tests cover per-provider retry resolution (Anthropic = 2, OpenAI = 3,
OpenRouter = 3, missing → default, explicit 0), the rate-limit handler
across upstream variants, and the validation gate (empty / all-ok / any-ok
passes; broken-model detail surfacing). Full nextest workspace run is
green (1289 tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Destynova2 Destynova2 enabled auto-merge April 28, 2026 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant