fix(dispatch): hot-reload race + per-provider retry + RateLimitHandler trait by Destynova2 · Pull Request #323 · azerozero/grob

Destynova2 · 2026-04-28T20:46:27Z

Summary

Three related fixes around dispatch retry semantics and the hot-reload race, bundled into a single PR per the brief.

Hot-reload race fix — the /api/config/reload HTTP handler and the grob/server/reload_config JSON-RPC method previously spawned validate_config() as a background task after the atomic swap. An invalid config could therefore serve traffic for several seconds before the validation log line surfaced. Both endpoints now await validation against the candidate provider registry before swapping; on failure they return a 4xx (HTTP 422 Unprocessable Entity / JSON-RPC ERR_INTERNAL with detail) and leave the live inner snapshot untouched, so in-flight requests continue on the old config.
Per-provider max_retries — added max_retries: Option<u32> to ProviderConfig and a provider_max_retries() resolver that reads the per-provider value or falls back to the global MAX_RETRIES = 2. The dispatch retry loop in src/server/dispatch/retry.rs now consumes this resolved budget so Anthropic can stay at 2 while OpenAI / OpenRouter / DeepSeek opt into 3 (declarative — no provider names hard-coded in the dispatch path).
RateLimitHandler trait — new src/server/dispatch/rate_limit.rs module centralises 429/529/Anthropic-401 detection. The trait is implemented for ProviderError, exposes is_rate_limit() and a future-facing retry_after_ms() hook (currently returns None because ProviderError does not retain Retry-After headers — the hook is the explicit extension point for the post unified-error refactor). The three inline matches!(e, ProviderError::ApiError { status: 429, .. }) checks in retry.rs are replaced with single err.is_rate_limit() calls.

Test plan

cargo check --tests --workspace clean.
cargo clippy --tests --workspace -- -D warnings clean.
cargo fmt --all -- --check clean.
cargo nextest run --workspace — 1289 / 1289 passing locally.
New unit tests:
- server::budget::tests::resolve_max_retries_* — Anthropic = 2, OpenAI = 3, OpenRouter = 3, missing → default, explicit `0`, multi-provider isolation.
- server::dispatch::rate_limit::tests::* — Anthropic / OpenAI / DeepSeek 429, Anthropic 529 overload, Anthropic 401-with-`rate_limit_error` payload, auth-401 / 5xx / non-API-error negatives.
- server::config_api::tests::* — empty / all-ok / any-ok validation passes; broken router model surfaces a rejection with detail and a `broken_models` JSON array.

Notes for reviewers

Originally targeted fix/preset-mod-include-str per the brief. That branch has since been merged and deleted on the remote, so this PR targets main directly — the diff is identical because fix/preset-mod-include-str was already at the tip of main (ee43b24).
Likely conflict with the parallel "unified error" PR which also touches src/server/dispatch/retry.rs. Rebasing on top of that PR is expected. If the unified error type exposes a richer RateLimitHandler::retry_after_ms() source (header-aware), pick that PR's API at merge time and keep the trait + per-provider budget plumbing from this PR.
The same race fix has been applied symmetrically to the JSON-RPC grob/server/reload_config handler in src/server/rpc/server_ns.rs so both reload surfaces share the validate-before-swap contract.

🤖 Generated with Claude Code

…r trait Three related fixes around dispatch retry semantics and the hot-reload race: 1. Block hot-reload until validation completes. Both `/api/config/reload` (HTTP) and the `grob/server/reload_config` JSON-RPC endpoint awaited `validate_config()` *after* the atomic swap, so an invalid config could serve traffic for several seconds. They now validate against the candidate provider registry before swapping; failure returns 422 with a list of broken router models and leaves the live snapshot intact, so in-flight requests continue on the old config. 2. Per-provider `max_retries`. Add `max_retries: Option<u32>` to `ProviderConfig` and a `provider_max_retries()` resolver that reads the per-provider value or falls back to the global `MAX_RETRIES = 2`. The dispatch retry loop in `src/server/dispatch/retry.rs` now consumes this resolved budget so Anthropic can stay at 2 while OpenAI and OpenRouter / DeepSeek can opt into 3. 3. `RateLimitHandler` trait. Centralise the 429/529/Anthropic-401 logic that was duplicated across three sites in `retry.rs`. The trait is implemented for `ProviderError`, exposes `is_rate_limit()` and a future-facing `retry_after_ms()` hook, and replaces the inline `matches!(e, ProviderError::ApiError { status: 429, .. })` checks. Tests cover per-provider retry resolution (Anthropic = 2, OpenAI = 3, OpenRouter = 3, missing → default, explicit 0), the rate-limit handler across upstream variants, and the validation gate (empty / all-ok / any-ok passes; broken-model detail surfacing). Full nextest workspace run is green (1289 tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Destynova2 enabled auto-merge April 28, 2026 20:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(dispatch): hot-reload race + per-provider retry + RateLimitHandler trait#323

fix(dispatch): hot-reload race + per-provider retry + RateLimitHandler trait#323
Destynova2 wants to merge 1 commit intomainfrom
fix/dispatch-retry-and-reload

Destynova2 commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Destynova2 commented Apr 28, 2026

Summary

Test plan

Notes for reviewers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant