Skip to content

Add LLM API retry engine for transient failures#211

Open
nhicks00 wants to merge 2 commits intompfaffenberger:mainfrom
nhicks00:feature/210-llm-api-retry-engine
Open

Add LLM API retry engine for transient failures#211
nhicks00 wants to merge 2 commits intompfaffenberger:mainfrom
nhicks00:feature/210-llm-api-retry-engine

Conversation

@nhicks00
Copy link
Copy Markdown
Collaborator

@nhicks00 nhicks00 commented Feb 27, 2026

Summary

Adds llm_run_with_retry() — a retry wrapper around all three pydantic_agent.run()
call sites in base_agent.py. Handles 429 rate limits, 529 overloads, 5xx server
errors, network failures, and streaming errors with exponential backoff.

  • New module: code_puppy/llm_retry.py (357 lines)
  • Modified: base_agent.py (3 call sites wrapped), callbacks.py (+2 phase types)
  • Tests: 60 tests, 97% coverage on llm_retry.py

What it does

Error Behavior
429 rate limit Retry with backoff (0.5s → 1s → 2s → ... → 32s cap)
529 overloaded Retry, short-circuit after 3 consecutive
500/502/503/504 Retry with backoff
Network/DNS/timeout Retry (walks full __cause__ chain through pydantic_ai → SDK → httpx → OSError)
Streaming failure Retry ("Streamed response ended" from pydantic_ai)
400/402/404/422 Fatal — no retry
Schema/validation Fatal — no retry
Retry-After header Honored over computed backoff
x-should-retry header Authoritative when present

Configurable via PUPPY_MAX_LLM_RETRIES env var (default 10).

Walmart Fork Sync Notes

This PR modifies core code that diverges from the Walmart fork. Here's what
the sync merge looks like:

callbacks.py — Additive merge, no conflicts expected. Walmart fork has
run_shell_command_output (not in OSS); this PR adds api_retry_start /
api_retry_end. Keep all entries from both sides.

base_agent.py — Merge conflict expected at the pydantic_agent.run() call
sites. The Walmart fork wraps these with _run_with_streaming_retry(). Resolution:

  1. Delete _run_with_streaming_retry(), MAX_STREAMING_RETRIES, and
    STREAMING_RETRY_DELAYS entirely
  2. Keep the llm_run_with_retry() wrapper from this PR — it handles the same
    streaming error patterns ("streamed response ended") plus HTTP-level retries
    that _run_with_streaming_retry never covered
  3. Rename event_stream_handlerstream_handler to match Walmart fork naming
  4. Carry over the except* RetryExhaustedError handler into the fork's exception chain

llm_retry.py and tests/test_llm_retry.py — New files, land cleanly.

Design Note: Why core, not a plugin

We considered making retry a plugin via a wrap_llm_call callback, which would
let the Walmart fork register a gateway-aware retry strategy without merge conflicts.
We opted against it because the existing callback system is fire-and-forget — it
doesn't support middleware-style coroutine wrapping without significant plumbing
changes. A utility function in core is the simplest approach that works.

For Walmart-specific gateway intelligence (e.g., detecting Vertex AI per-minute
quota windows, per-model quota tracking), a Walmart fork plugin can register an
api_retry_start callback to layer on top of the core retry engine. The hooks
are already in place for this — no core changes needed.

Test plan

  • 60 unit tests pass (simulated 401, 400, 429, 503, 529, streaming, wrapped ConnectionError, cancellation)
  • Manual test: live API call (Synthetic/Kimi-K2.5) — happy path works, retry wrapper invisible
  • Manual test: simulated gateway 429 (ModelHTTPError(status_code=429)) — retries and recovers
  • Full test suite: 10,371 pass, 0 new failures
  • ruff format / ruff check clean
  • Verified error classification against all pydantic_ai providers (Anthropic, OpenAI, Gemini)

Resolves #210

Summary by CodeRabbit

  • New Features

    • Added robust automatic retrying for LLM API calls with exponential backoff, jitter, overload handling, and optional cancellation.
    • New retry lifecycle events (retry start/end) for tooling and integrations.
    • Improved user-facing messaging when retry attempts are exhausted or APIs are under load.
  • Tests

    • Added comprehensive tests covering retry logic, backoff behavior, cancellation, and error classification.

Add llm_run_with_retry() that wraps pydantic_agent.run() calls with
retry logic for transient API failures (429, 529, 5xx, network errors,
streaming failures).

Key features:
- Exponential backoff with 25% jitter, 500ms base, 32s cap
- Retry-After header takes absolute priority
- 529 consecutive overload detection (short-circuits after 3)
- Abort-aware sleep via asyncio.Event for user cancellation
- Streaming error detection (handles pydantic_ai stream failures)
- Full __cause__ chain walker for wrapped network errors
- api_retry_start / api_retry_end callback hooks
- PUPPY_MAX_LLM_RETRIES env var override (default 10)

Walmart fork sync notes:
- callbacks.py: additive merge — keep run_shell_command_output,
  add api_retry_start/api_retry_end
- base_agent.py: delete _run_with_streaming_retry() entirely,
  replace with llm_run_with_retry(). Rename event_stream_handler
  -> stream_handler to match fork naming.
- llm_retry.py and tests/test_llm_retry.py are clean file adds

Resolves: mpfaffenberger#210

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 27, 2026

📝 Walkthrough

Walkthrough

Adds a new LLM API retry engine and integrates it into agent execution: introduces code_puppy/llm_retry.py, wires llm_run_with_retry into base_agent.py call sites, and extends callback phases for retry lifecycle notifications.

Changes

Cohort / File(s) Summary
LLM Retry Engine
code_puppy/llm_retry.py
New module implementing LLM retry logic with exponential backoff, jitter, Retry-After handling, retryability heuristics, cancellable sleeps, LLMRetryConfig, RetryExhaustedError, is_retryable(), and llm_run_with_retry() coroutine.
Retry Integration
code_puppy/agents/base_agent.py
Imports LLMRetryConfig, RetryExhaustedError, and llm_run_with_retry; replaces direct pydantic_agent.run calls with the retry wrapper across DBOS and non-DBOS paths, initializes retry config, and handles RetryExhaustedError alongside existing usage-limit errors.
Callback Support
code_puppy/callbacks.py
Extends PhaseType with "api_retry_start" and "api_retry_end" and adds corresponding keys to _callbacks for registering retry lifecycle hooks.
Tests
tests/test_llm_retry.py
New comprehensive test suite covering helpers, backoff computation, retryable classification, overload handling, cancellable sleep, callback invocation, env-driven config, success/retry/exhaustion scenarios, and cancellation behavior.

Sequence Diagram

sequenceDiagram
    participant Caller as Agent Caller
    participant RetryWrapper as llm_run_with_retry
    participant AgentRun as pydantic_agent.run
    participant API as LLM API
    participant Callbacks as Callbacks

    Caller->>RetryWrapper: llm_run_with_retry(coro_factory, config)
    loop Retry Loop (attempts ≤ max_retries)
        RetryWrapper->>AgentRun: Invoke coro_factory() (Attempt N)
        AgentRun->>API: LLM request
        alt Success
            API-->>AgentRun: Response
            AgentRun-->>RetryWrapper: Result
            RetryWrapper->>Callbacks: api_retry_end(total_attempts)
            RetryWrapper-->>Caller: Return result
        else Retryable Error
            API-->>AgentRun: Transient error (429/5xx/timeout/overload)
            AgentRun-->>RetryWrapper: Exception
            RetryWrapper->>Callbacks: api_retry_start(error, attempt, delay_ms, max_retries)
            RetryWrapper->>RetryWrapper: Compute delay (Retry-After or exp+jit)
            alt Cancel Event Set
                RetryWrapper-->>Caller: Propagate CancelledError
            else Sleep then retry
                RetryWrapper->>RetryWrapper: Sleep(delay_ms) (cancellable)
            end
        else Non-retryable / Exhausted
            API-->>AgentRun: Fatal error or retries exhausted
            AgentRun-->>RetryWrapper: Exception
            RetryWrapper-->>Caller: Raise RetryExhaustedError (with original)
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • mpfaffenberger

Poem

🐰 Hops through code with jittered cheer,

Retries that wait — then success is near,
Callbacks hum as attempts ascend,
Until the API grants the end,
A rabbit cheers: retry complete!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 38.82% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the main change: adding an LLM API retry engine for transient failures. This directly aligns with the primary objective of implementing exponential-backoff retry logic wrapping pydantic_agent.run() calls.
Linked Issues check ✅ Passed The PR fully implements the core objectives from issue #210: exponential backoff formula, retryable error classification, Retry-After header handling, max_retries configuration, abort-aware sleep, callback hooks (api_retry_start/api_retry_end), and integration into base_agent.py with comprehensive test coverage.
Out of Scope Changes check ✅ Passed All changes are within scope: llm_retry.py implements the retry engine, callbacks.py adds api_retry_start/api_retry_end phases, base_agent.py integrates the retry wrapper, and test_llm_retry.py provides comprehensive unit tests. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
code_puppy/agents/base_agent.py (1)

1909-1910: Remove duplicate _toolsets assignment.

Line 1910 repeats the same assignment from Line 1909.

✂️ Tiny cleanup
                     original_toolsets = pydantic_agent._toolsets
                     pydantic_agent._toolsets = original_toolsets + self._mcp_servers
-                    pydantic_agent._toolsets = original_toolsets + self._mcp_servers
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@code_puppy/agents/base_agent.py` around lines 1909 - 1910, Remove the
duplicate assignment to pydantic_agent._toolsets: there are two identical lines
setting pydantic_agent._toolsets = original_toolsets + self._mcp_servers; keep a
single assignment and delete the redundant one so only one update to
pydantic_agent._toolsets remains (refer to pydantic_agent, _toolsets,
original_toolsets, and self._mcp_servers to locate the lines).
tests/test_llm_retry.py (1)

385-399: test_retry_after_header_respected does not currently prove header precedence.

Using retry-after: "0" plus no _compute_backoff argument assertion allows this test to pass even if header extraction regresses.

✅ Make the test assert the actual contract
     async def test_retry_after_header_respected(self):
         """Retry-After header value flows into backoff calculation."""
         call_count = 0
 
         async def factory():
             nonlocal call_count
             call_count += 1
             if call_count == 1:
-                raise _make_api_error(429, headers={"retry-after": "0"})
+                raise _make_api_error(429, headers={"retry-after": "1.5"})
             return "ok"
 
-        # We want to verify the header was extracted, not that we waited
-        result = await llm_run_with_retry(factory, config=LLMRetryConfig(max_retries=3))
+        with patch("code_puppy.llm_retry._compute_backoff", return_value=0.001) as backoff:
+            result = await llm_run_with_retry(
+                factory, config=LLMRetryConfig(max_retries=3)
+            )
+        backoff.assert_called_once_with(1, 1.5)
         assert result == "ok"
         assert call_count == 2
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_llm_retry.py` around lines 385 - 399, Update
test_retry_after_header_respected to assert header precedence by injecting a
custom _compute_backoff that would return a different backoff than the
Retry-After header so the test fails if header extraction regresses: have the
factory raise a 429 with a non-zero "retry-after" value, call
llm_run_with_retry(factory, config=LLMRetryConfig(max_retries=3),
_compute_backoff=<custom_callable>) where <custom_callable> returns a sentinel
backoff, and assert the retry used the header-derived value (e.g., by observing
call_count==2 and/or by verifying that the computed backoff came from the header
rather than the injected _compute_backoff). Ensure you reference
test_retry_after_header_respected, llm_run_with_retry, _compute_backoff, and
LLMRetryConfig when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@code_puppy/llm_retry.py`:
- Around line 176-194: The retry logic currently treats status 400 as fatal;
update the function that uses _get_status_code (the is_retryable / should_retry
path around _get_status_code(error)) to detect the specific context-overflow 400
response and implement the context-overflow recovery: when the error payload
indicates inputTokens/contextLimit overflow, parse inputTokens and contextLimit
from the error, compute available = contextLimit - inputTokens, set the retry
request's max_tokens to max(3000, available) and flag this retry to happen
immediately (no sleep/backoff), then requeue the request; leave other 400s
unchanged (still fatal). Ensure this same handling is added where the code
currently exits on 400 in the other retry branch referenced (the block around
the immediate-exit at the other use of _get_status_code).
- Around line 51-57: LLMRetryConfig currently only exposes max_retries and
cancel_event, so the retry loop always raises RetryExhaustedError (e.g., in the
function that currently forces that at lines ~307-317) and there's no way to
enable overload-threshold fallback; add a fallback configuration to
LLMRetryConfig (for example a fallback_strategy enum or a boolean like
enable_overload_fallback plus optional fallback parameters such as
overload_threshold and fallback_backoff) and update the retry logic (the routine
that raises RetryExhaustedError) to consult this new config: when overload
conditions are met and fallback is enabled, execute the configured fallback
behavior instead of unconditionally raising RetryExhaustedError. Ensure
references to LLMRetryConfig, max_retries, cancel_event, and the retry
loop/RetryExhaustedError raise site are updated to use the new fields.
- Around line 37-48: _env var PUPPY_MAX_LLM_RETRIES parsed in
_resolve_max_retries currently allows negative or zero values which can lead to
zero-attempt execution and an invalid "retry-exhausted" state (last_error=None);
update _resolve_max_retries to coerce the parsed int to a safe minimum (e.g., if
parsed <= 0, log a warning and return _DEFAULT_MAX_RETRIES or 1) so the function
never returns <= 0, and audit any retry-loop logic that reads this value (areas
around where retries are consumed and last_error is checked) to ensure they rely
on a positive max_retries value.

---

Nitpick comments:
In `@code_puppy/agents/base_agent.py`:
- Around line 1909-1910: Remove the duplicate assignment to
pydantic_agent._toolsets: there are two identical lines setting
pydantic_agent._toolsets = original_toolsets + self._mcp_servers; keep a single
assignment and delete the redundant one so only one update to
pydantic_agent._toolsets remains (refer to pydantic_agent, _toolsets,
original_toolsets, and self._mcp_servers to locate the lines).

In `@tests/test_llm_retry.py`:
- Around line 385-399: Update test_retry_after_header_respected to assert header
precedence by injecting a custom _compute_backoff that would return a different
backoff than the Retry-After header so the test fails if header extraction
regresses: have the factory raise a 429 with a non-zero "retry-after" value,
call llm_run_with_retry(factory, config=LLMRetryConfig(max_retries=3),
_compute_backoff=<custom_callable>) where <custom_callable> returns a sentinel
backoff, and assert the retry used the header-derived value (e.g., by observing
call_count==2 and/or by verifying that the computed backoff came from the header
rather than the injected _compute_backoff). Ensure you reference
test_retry_after_header_respected, llm_run_with_retry, _compute_backoff, and
LLMRetryConfig when making the change.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ed9b088 and 77bec84.

📒 Files selected for processing (4)
  • code_puppy/agents/base_agent.py
  • code_puppy/callbacks.py
  • code_puppy/llm_retry.py
  • tests/test_llm_retry.py

Comment thread code_puppy/llm_retry.py
Comment thread code_puppy/llm_retry.py
Comment on lines +51 to +57
@dataclass
class LLMRetryConfig:
"""Configuration for the LLM retry engine."""

max_retries: int = field(default_factory=_resolve_max_retries)
cancel_event: Optional[asyncio.Event] = None

Copy link
Copy Markdown

@coderabbitai coderabbitai bot Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Consecutive-529 short-circuit cannot trigger fallback behavior.

Line 312 always raises RetryExhaustedError, and LLMRetryConfig (Line 55-Line 56) has no fallback configuration, so overload-threshold fallback cannot be enabled.

🧩 Suggested API extension
 `@dataclass`
 class LLMRetryConfig:
     """Configuration for the LLM retry engine."""
 
     max_retries: int = field(default_factory=_resolve_max_retries)
     cancel_event: Optional[asyncio.Event] = None
+    fallback_model: Optional[str] = None
+    max_consecutive_overloads: int = _MAX_CONSECUTIVE_OVERLOADS
+
+
+class FallbackTriggeredError(Exception):
+    def __init__(self, fallback_model: str, original_error: Exception):
+        super().__init__(f"Switch to fallback model: {fallback_model}")
+        self.fallback_model = fallback_model
+        self.original_error = original_error
@@
-                if overload_hits >= _MAX_CONSECUTIVE_OVERLOADS:
+                if overload_hits >= config.max_consecutive_overloads:
+                    if config.fallback_model:
+                        raise FallbackTriggeredError(
+                            config.fallback_model, error
+                        ) from error
                     raise RetryExhaustedError(
-                        f"API returned {_MAX_CONSECUTIVE_OVERLOADS} consecutive "
+                        f"API returned {config.max_consecutive_overloads} consecutive "
                         f"overloaded errors",
                         error,
                     ) from error

Also applies to: 307-317

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@code_puppy/llm_retry.py` around lines 51 - 57, LLMRetryConfig currently only
exposes max_retries and cancel_event, so the retry loop always raises
RetryExhaustedError (e.g., in the function that currently forces that at lines
~307-317) and there's no way to enable overload-threshold fallback; add a
fallback configuration to LLMRetryConfig (for example a fallback_strategy enum
or a boolean like enable_overload_fallback plus optional fallback parameters
such as overload_threshold and fallback_backoff) and update the retry logic (the
routine that raises RetryExhaustedError) to consult this new config: when
overload conditions are met and fallback is enabled, execute the configured
fallback behavior instead of unconditionally raising RetryExhaustedError. Ensure
references to LLMRetryConfig, max_retries, cancel_event, and the retry
loop/RetryExhaustedError raise site are updated to use the new fields.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentionally not addressed — YAGNI. We originally implemented FallbackTriggeredError and fallback_model config, then deliberately removed them during code review. No fallback model infrastructure exists in Code Puppy today, and adding speculative plumbing for a feature that doesn't exist yet violates YAGNI. When model fallback is needed, it can be added as a focused follow-up with its own tests and config surface.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nhicks00, understood! Thank you for the context. Applying YAGNI to avoid speculative plumbing for non-existent infrastructure is the right call—better to add the fallback config surface when the actual fallback model logic lands. I've noted this decision for future reference.


✏️ Learnings added
Learnt from: nhicks00
Repo: mpfaffenberger/code_puppy PR: 211
File: code_puppy/llm_retry.py:59-65
Timestamp: 2026-02-27T03:54:07.263Z
Learning: In code_puppy/llm_retry.py, FallbackTriggeredError and fallback_model configuration were intentionally removed during review. No fallback model infrastructure exists in Code Puppy today, and adding speculative plumbing violates YAGNI. Fallback features should be added later as a focused follow-up when needed.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Comment thread code_puppy/llm_retry.py
Comment on lines +176 to +194
status = _get_status_code(error)
if status is None:
# No status code and not a recognized transient pattern — don't retry
# blindly. Only specifically-identified patterns above are retried.
return False

if status == 408:
return True # Request Timeout
if status == 409:
return True # Conflict
if status == 429:
return True # Rate Limit
if status == 401:
return True # Unauthorized (token may need refresh)
if status >= 500:
return True # Server errors

# 400 (non-overflow), 402, 403, 404, 422, etc. — fatal
return False
Copy link
Copy Markdown

@coderabbitai coderabbitai bot Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing context-overflow recovery path for retryable 400 overflow errors.

Line 193 treats 400 as fatal, and Line 328 immediately exits. The required context-overflow handling (parse inputTokens/contextLimit, compute available tokens, set max_tokens=max(3000, available), retry immediately without sleep) is not implemented.

Also applies to: 327-333

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@code_puppy/llm_retry.py` around lines 176 - 194, The retry logic currently
treats status 400 as fatal; update the function that uses _get_status_code (the
is_retryable / should_retry path around _get_status_code(error)) to detect the
specific context-overflow 400 response and implement the context-overflow
recovery: when the error payload indicates inputTokens/contextLimit overflow,
parse inputTokens and contextLimit from the error, compute available =
contextLimit - inputTokens, set the retry request's max_tokens to max(3000,
available) and flag this retry to happen immediately (no sleep/backoff), then
requeue the request; leave other 400s unchanged (still fatal). Ensure this same
handling is added where the code currently exits on 400 in the other retry
branch referenced (the block around the immediate-exit at the other use of
_get_status_code).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentionally not addressed — out of scope. Context-overflow recovery (detecting 400s caused by exceeding context window, then auto-reducing max_tokens or truncating prompts) is a new feature, not a bug in the retry engine. It would require token counting logic, model-specific context limits, and prompt truncation strategies — none of which belong in a retry PR. This would be a good candidate for a separate issue and focused follow-up PR.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nhicks00, that's a reasonable architectural decision. You're right that context-overflow recovery involves significant complexity (token counting, model-specific context windows, prompt truncation strategies) that would expand the scope of this retry engine PR considerably.

While issue #210 did reference context-overflow handling, your core retry engine implementation successfully addresses the primary objective: robust exponential backoff and transient failure handling for the identified error classes. Separating the context-overflow feature into a focused follow-up PR will allow for proper design and testing of that specific capability.

Would you like me to open a follow-up issue to track the context-overflow recovery feature? I can draft it with the specific requirements (detect context-window 400s, parse token limits, compute available tokens, adjust max_tokens, immediate retry) and link it back to this PR and #210 for context.

- Clamp negative PUPPY_MAX_LLM_RETRIES values to default with warning
- Tighten test_retry_after_header_respected to assert _compute_backoff
  receives the parsed header value (1.5), not just that retry occurred
- Add test_config_negative_env test

Addresses CodeRabbit review findings mpfaffenberger#1 and mpfaffenberger#5 on PR mpfaffenberger#211.

CodeRabbit findings not addressed (by design):
- mpfaffenberger#2 (FallbackTriggeredError): Intentionally removed during code review as
  YAGNI. No fallback model infrastructure exists today — adding speculative
  code for a feature that doesn't exist yet violates YAGNI.
- mpfaffenberger#3 (context-overflow 400 auto-reduce max_tokens): Out of scope. This is
  a new feature requiring token counting, model-specific context limits,
  and prompt truncation — none of which belong in a retry engine PR.
- mpfaffenberger#4 (duplicate _toolsets line in base_agent.py): Pre-existing issue not
  introduced by this PR. Fixing unrelated code here violates minimal-diff
  principles.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
code_puppy/llm_retry.py (2)

281-287: ⚠️ Potential issue | 🟠 Major

Clamp/validate config.max_retries before the loop.

A negative value can still be provided directly via LLMRetryConfig(max_retries=...), which can skip all attempts and hit the tail RetryExhaustedError path with last_error=None.

Proposed hardening
 async def llm_run_with_retry(
     coro_factory: Callable[[], Any],
     config: Optional[LLMRetryConfig] = None,
 ) -> Any:
@@
     if config is None:
         config = LLMRetryConfig()
 
-    max_retries = config.max_retries
+    max_retries = max(0, config.max_retries)
     overload_hits = 0
     last_error: Optional[Exception] = None

Also applies to: 362-365

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@code_puppy/llm_retry.py` around lines 281 - 287, Clamp and validate
config.max_retries before using it in the retry loop: ensure the value read from
config.max_retries is converted to an int and clamped to a non-negative (e.g.,
max(0, int(...))) and optionally bounded by a sensible upper limit, then assign
that sanitized value to the local max_retries used by the loop (the variable set
just above the "for attempt in range(1, max_retries + 2):" line); do the same
sanitization wherever you read config.max_retries (also at the later usage
around lines 362-365) so the loop cannot be skipped due to a negative value and
last_error cannot remain None when RetryExhaustedError is raised.

120-125: ⚠️ Potential issue | 🟡 Minor

Normalize x-should-retry value parsing.

The check is case-sensitive and whitespace-sensitive, so authoritative header hints like "True" or " false " are ignored.

Proposed fix
 def _get_x_should_retry(error: Exception) -> Optional[bool]:
@@
-    val = headers.get("x-should-retry")
-    if val == "true":
+    val = headers.get("x-should-retry")
+    if val is None:
+        return None
+    normalized = str(val).strip().lower()
+    if normalized == "true":
         return True
-    if val == "false":
+    if normalized == "false":
         return False
     return None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@code_puppy/llm_retry.py` around lines 120 - 125, The header parsing for
"x-should-retry" is currently case- and whitespace-sensitive; normalize the
retrieved value (the variable val from headers.get("x-should-retry")) by calling
strip() and lower() before comparison so values like " True " or "FALSE" are
handled correctly, then return True if normalized == "true", False if normalized
== "false", otherwise return None.
🧹 Nitpick comments (1)
code_puppy/llm_retry.py (1)

90-109: Support HTTP-date Retry-After values for full RFC compatibility.

Right now only numeric seconds are parsed. If the server sends a date-form Retry-After, this path falls back to computed backoff instead of honoring the header.

Proposed refactor
+from datetime import datetime, timezone
+from email.utils import parsedate_to_datetime
@@
 def _get_retry_after(error: Exception) -> Optional[float]:
@@
     try:
         return float(val)
     except (ValueError, TypeError):
-        return None
+        try:
+            dt = parsedate_to_datetime(str(val))
+            if dt.tzinfo is None:
+                dt = dt.replace(tzinfo=timezone.utc)
+            delta = (dt - datetime.now(timezone.utc)).total_seconds()
+            return delta if delta > 0 else None
+        except Exception:
+            return None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@code_puppy/llm_retry.py` around lines 90 - 109, The _get_retry_after function
currently only parses numeric seconds; update it to also accept HTTP-date
strings by, after the numeric float attempt, trying to parse the header value
with a RFC-compatible parser (e.g., email.utils.parsedate_to_datetime) to
produce a timezone-aware datetime, compute seconds = (parsed_datetime -
datetime.now(timezone.utc)).total_seconds() and return that seconds value if
positive (or 0), otherwise None; ensure you handle parsing exceptions and
non-aware datetimes by normalizing to UTC and return None on failure. Reference:
function name _get_retry_after, local variable val and headers retrieval logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@code_puppy/llm_retry.py`:
- Around line 281-287: Clamp and validate config.max_retries before using it in
the retry loop: ensure the value read from config.max_retries is converted to an
int and clamped to a non-negative (e.g., max(0, int(...))) and optionally
bounded by a sensible upper limit, then assign that sanitized value to the local
max_retries used by the loop (the variable set just above the "for attempt in
range(1, max_retries + 2):" line); do the same sanitization wherever you read
config.max_retries (also at the later usage around lines 362-365) so the loop
cannot be skipped due to a negative value and last_error cannot remain None when
RetryExhaustedError is raised.
- Around line 120-125: The header parsing for "x-should-retry" is currently
case- and whitespace-sensitive; normalize the retrieved value (the variable val
from headers.get("x-should-retry")) by calling strip() and lower() before
comparison so values like " True " or "FALSE" are handled correctly, then return
True if normalized == "true", False if normalized == "false", otherwise return
None.

---

Nitpick comments:
In `@code_puppy/llm_retry.py`:
- Around line 90-109: The _get_retry_after function currently only parses
numeric seconds; update it to also accept HTTP-date strings by, after the
numeric float attempt, trying to parse the header value with a RFC-compatible
parser (e.g., email.utils.parsedate_to_datetime) to produce a timezone-aware
datetime, compute seconds = (parsed_datetime -
datetime.now(timezone.utc)).total_seconds() and return that seconds value if
positive (or 0), otherwise None; ensure you handle parsing exceptions and
non-aware datetimes by normalizing to UTC and return None on failure. Reference:
function name _get_retry_after, local variable val and headers retrieval logic.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 77bec84 and b4a5c17.

📒 Files selected for processing (2)
  • code_puppy/llm_retry.py
  • tests/test_llm_retry.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_llm_retry.py

@nhicks00
Copy link
Copy Markdown
Collaborator Author

Response to second CodeRabbit review (b4a5c17)

1. Clamp config.max_retries in llm_run_with_retry() for direct constructor use:
Acknowledged as a valid defense-in-depth point. However, LLMRetryConfig is an internal API — not user-facing input. The env var path (the only user-facing entry point) is now guarded. A developer passing LLMRetryConfig(max_retries=-5) directly is a programming error, not a runtime input validation concern. If the maintainer prefers the extra max(0, ...) clamp, happy to add it — but it's not a correctness bug since the loop still terminates safely with negative values.

2. Normalize x-should-retry header (case/whitespace):
Not addressed — the Anthropic API sends exactly "true" or "false" per their SDK. Adding .strip().lower() normalization for hypothetical non-standard servers is speculative. If a real server sends "True" or " false ", we can add normalization then.

3. Support HTTP-date Retry-After values:
Not addressed — LLM APIs (Anthropic, OpenAI, Google) send numeric seconds exclusively. Adding RFC 7231 HTTP-date parsing with datetime imports is over-engineering for a case that doesn't occur in practice with any known LLM provider.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: LLM API retry engine with exponential backoff and gateway awareness

1 participant