Add LLM API retry engine for transient failures by nhicks00 · Pull Request #211 · mpfaffenberger/code_puppy

nhicks00 · 2026-02-27T03:13:05Z

Summary

Adds llm_run_with_retry() — a retry wrapper around all three pydantic_agent.run()
call sites in base_agent.py. Handles 429 rate limits, 529 overloads, 5xx server
errors, network failures, and streaming errors with exponential backoff.

New module: code_puppy/llm_retry.py (357 lines)
Modified: base_agent.py (3 call sites wrapped), callbacks.py (+2 phase types)
Tests: 60 tests, 97% coverage on llm_retry.py

What it does

Error	Behavior
429 rate limit	Retry with backoff (0.5s → 1s → 2s → ... → 32s cap)
529 overloaded	Retry, short-circuit after 3 consecutive
500/502/503/504	Retry with backoff
Network/DNS/timeout	Retry (walks full `__cause__` chain through pydantic_ai → SDK → httpx → OSError)
Streaming failure	Retry (`"Streamed response ended"` from pydantic_ai)
400/402/404/422	Fatal — no retry
Schema/validation	Fatal — no retry
`Retry-After` header	Honored over computed backoff
`x-should-retry` header	Authoritative when present

Configurable via PUPPY_MAX_LLM_RETRIES env var (default 10).

Walmart Fork Sync Notes

This PR modifies core code that diverges from the Walmart fork. Here's what
the sync merge looks like:

callbacks.py — Additive merge, no conflicts expected. Walmart fork has
run_shell_command_output (not in OSS); this PR adds api_retry_start /
api_retry_end. Keep all entries from both sides.

base_agent.py — Merge conflict expected at the pydantic_agent.run() call
sites. The Walmart fork wraps these with _run_with_streaming_retry(). Resolution:

Delete _run_with_streaming_retry(), MAX_STREAMING_RETRIES, and
STREAMING_RETRY_DELAYS entirely
Keep the llm_run_with_retry() wrapper from this PR — it handles the same
streaming error patterns ("streamed response ended") plus HTTP-level retries
that _run_with_streaming_retry never covered
Rename event_stream_handler → stream_handler to match Walmart fork naming
Carry over the except* RetryExhaustedError handler into the fork's exception chain

llm_retry.py and tests/test_llm_retry.py — New files, land cleanly.

Design Note: Why core, not a plugin

We considered making retry a plugin via a wrap_llm_call callback, which would
let the Walmart fork register a gateway-aware retry strategy without merge conflicts.
We opted against it because the existing callback system is fire-and-forget — it
doesn't support middleware-style coroutine wrapping without significant plumbing
changes. A utility function in core is the simplest approach that works.

For Walmart-specific gateway intelligence (e.g., detecting Vertex AI per-minute
quota windows, per-model quota tracking), a Walmart fork plugin can register an
api_retry_start callback to layer on top of the core retry engine. The hooks
are already in place for this — no core changes needed.

Test plan

60 unit tests pass (simulated 401, 400, 429, 503, 529, streaming, wrapped ConnectionError, cancellation)
Manual test: live API call (Synthetic/Kimi-K2.5) — happy path works, retry wrapper invisible
Manual test: simulated gateway 429 (ModelHTTPError(status_code=429)) — retries and recovers
Full test suite: 10,371 pass, 0 new failures
ruff format / ruff check clean
Verified error classification against all pydantic_ai providers (Anthropic, OpenAI, Gemini)

Resolves #210

Summary by CodeRabbit

New Features
- Added robust automatic retrying for LLM API calls with exponential backoff, jitter, overload handling, and optional cancellation.
- New retry lifecycle events (retry start/end) for tooling and integrations.
- Improved user-facing messaging when retry attempts are exhausted or APIs are under load.
Tests
- Added comprehensive tests covering retry logic, backoff behavior, cancellation, and error classification.

Add llm_run_with_retry() that wraps pydantic_agent.run() calls with retry logic for transient API failures (429, 529, 5xx, network errors, streaming failures). Key features: - Exponential backoff with 25% jitter, 500ms base, 32s cap - Retry-After header takes absolute priority - 529 consecutive overload detection (short-circuits after 3) - Abort-aware sleep via asyncio.Event for user cancellation - Streaming error detection (handles pydantic_ai stream failures) - Full __cause__ chain walker for wrapped network errors - api_retry_start / api_retry_end callback hooks - PUPPY_MAX_LLM_RETRIES env var override (default 10) Walmart fork sync notes: - callbacks.py: additive merge — keep run_shell_command_output, add api_retry_start/api_retry_end - base_agent.py: delete _run_with_streaming_retry() entirely, replace with llm_run_with_retry(). Rename event_stream_handler -> stream_handler to match fork naming. - llm_retry.py and tests/test_llm_retry.py are clean file adds Resolves: mpfaffenberger#210 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-02-27T03:13:20Z

📝 Walkthrough

Walkthrough

Adds a new LLM API retry engine and integrates it into agent execution: introduces code_puppy/llm_retry.py, wires llm_run_with_retry into base_agent.py call sites, and extends callback phases for retry lifecycle notifications.

Changes

Cohort / File(s)	Summary
LLM Retry Engine `code_puppy/llm_retry.py`	New module implementing LLM retry logic with exponential backoff, jitter, Retry-After handling, retryability heuristics, cancellable sleeps, `LLMRetryConfig`, `RetryExhaustedError`, `is_retryable()`, and `llm_run_with_retry()` coroutine.
Retry Integration `code_puppy/agents/base_agent.py`	Imports `LLMRetryConfig`, `RetryExhaustedError`, and `llm_run_with_retry`; replaces direct `pydantic_agent.run` calls with the retry wrapper across DBOS and non-DBOS paths, initializes retry config, and handles `RetryExhaustedError` alongside existing usage-limit errors.
Callback Support `code_puppy/callbacks.py`	Extends `PhaseType` with `"api_retry_start"` and `"api_retry_end"` and adds corresponding keys to `_callbacks` for registering retry lifecycle hooks.
Tests `tests/test_llm_retry.py`	New comprehensive test suite covering helpers, backoff computation, retryable classification, overload handling, cancellable sleep, callback invocation, env-driven config, success/retry/exhaustion scenarios, and cancellation behavior.

Sequence Diagram

sequenceDiagram
    participant Caller as Agent Caller
    participant RetryWrapper as llm_run_with_retry
    participant AgentRun as pydantic_agent.run
    participant API as LLM API
    participant Callbacks as Callbacks

    Caller->>RetryWrapper: llm_run_with_retry(coro_factory, config)
    loop Retry Loop (attempts ≤ max_retries)
        RetryWrapper->>AgentRun: Invoke coro_factory() (Attempt N)
        AgentRun->>API: LLM request
        alt Success
            API-->>AgentRun: Response
            AgentRun-->>RetryWrapper: Result
            RetryWrapper->>Callbacks: api_retry_end(total_attempts)
            RetryWrapper-->>Caller: Return result
        else Retryable Error
            API-->>AgentRun: Transient error (429/5xx/timeout/overload)
            AgentRun-->>RetryWrapper: Exception
            RetryWrapper->>Callbacks: api_retry_start(error, attempt, delay_ms, max_retries)
            RetryWrapper->>RetryWrapper: Compute delay (Retry-After or exp+jit)
            alt Cancel Event Set
                RetryWrapper-->>Caller: Propagate CancelledError
            else Sleep then retry
                RetryWrapper->>RetryWrapper: Sleep(delay_ms) (cancellable)
            end
        else Non-retryable / Exhausted
            API-->>AgentRun: Fatal error or retries exhausted
            AgentRun-->>RetryWrapper: Exception
            RetryWrapper-->>Caller: Raise RetryExhaustedError (with original)
        end
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

fix: add retry logic for transient HTTP errors during LLM streaming #200: Modifies base_agent.py to wrap pydantic_agent.run() with retry logic; closely related integration changes.

Suggested reviewers

mpfaffenberger

Poem

🐰 Hops through code with jittered cheer,

Retries that wait — then success is near,
Callbacks hum as attempts ascend,
Until the API grants the end,
A rabbit cheers: retry complete!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 38.82% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely summarizes the main change: adding an LLM API retry engine for transient failures. This directly aligns with the primary objective of implementing exponential-backoff retry logic wrapping pydantic_agent.run() calls.
Linked Issues check	✅ Passed	The PR fully implements the core objectives from issue `#210`: exponential backoff formula, retryable error classification, Retry-After header handling, max_retries configuration, abort-aware sleep, callback hooks (api_retry_start/api_retry_end), and integration into base_agent.py with comprehensive test coverage.
Out of Scope Changes check	✅ Passed	All changes are within scope: llm_retry.py implements the retry engine, callbacks.py adds api_retry_start/api_retry_end phases, base_agent.py integrates the retry wrapper, and test_llm_retry.py provides comprehensive unit tests. No unrelated modifications detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

code_puppy/agents/base_agent.py (1)

1909-1910: Remove duplicate _toolsets assignment.

Line 1910 repeats the same assignment from Line 1909.

✂️ Tiny cleanup

                     original_toolsets = pydantic_agent._toolsets
                     pydantic_agent._toolsets = original_toolsets + self._mcp_servers
-                    pydantic_agent._toolsets = original_toolsets + self._mcp_servers

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@code_puppy/agents/base_agent.py` around lines 1909 - 1910, Remove the
duplicate assignment to pydantic_agent._toolsets: there are two identical lines
setting pydantic_agent._toolsets = original_toolsets + self._mcp_servers; keep a
single assignment and delete the redundant one so only one update to
pydantic_agent._toolsets remains (refer to pydantic_agent, _toolsets,
original_toolsets, and self._mcp_servers to locate the lines).

tests/test_llm_retry.py (1)

385-399: test_retry_after_header_respected does not currently prove header precedence.

Using retry-after: "0" plus no _compute_backoff argument assertion allows this test to pass even if header extraction regresses.

✅ Make the test assert the actual contract

     async def test_retry_after_header_respected(self):
         """Retry-After header value flows into backoff calculation."""
         call_count = 0
 
         async def factory():
             nonlocal call_count
             call_count += 1
             if call_count == 1:
-                raise _make_api_error(429, headers={"retry-after": "0"})
+                raise _make_api_error(429, headers={"retry-after": "1.5"})
             return "ok"
 
-        # We want to verify the header was extracted, not that we waited
-        result = await llm_run_with_retry(factory, config=LLMRetryConfig(max_retries=3))
+        with patch("code_puppy.llm_retry._compute_backoff", return_value=0.001) as backoff:
+            result = await llm_run_with_retry(
+                factory, config=LLMRetryConfig(max_retries=3)
+            )
+        backoff.assert_called_once_with(1, 1.5)
         assert result == "ok"
         assert call_count == 2

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_llm_retry.py` around lines 385 - 399, Update
test_retry_after_header_respected to assert header precedence by injecting a
custom _compute_backoff that would return a different backoff than the
Retry-After header so the test fails if header extraction regresses: have the
factory raise a 429 with a non-zero "retry-after" value, call
llm_run_with_retry(factory, config=LLMRetryConfig(max_retries=3),
_compute_backoff=<custom_callable>) where <custom_callable> returns a sentinel
backoff, and assert the retry used the header-derived value (e.g., by observing
call_count==2 and/or by verifying that the computed backoff came from the header
rather than the injected _compute_backoff). Ensure you reference
test_retry_after_header_respected, llm_run_with_retry, _compute_backoff, and
LLMRetryConfig when making the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@code_puppy/llm_retry.py`:
- Around line 176-194: The retry logic currently treats status 400 as fatal;
update the function that uses _get_status_code (the is_retryable / should_retry
path around _get_status_code(error)) to detect the specific context-overflow 400
response and implement the context-overflow recovery: when the error payload
indicates inputTokens/contextLimit overflow, parse inputTokens and contextLimit
from the error, compute available = contextLimit - inputTokens, set the retry
request's max_tokens to max(3000, available) and flag this retry to happen
immediately (no sleep/backoff), then requeue the request; leave other 400s
unchanged (still fatal). Ensure this same handling is added where the code
currently exits on 400 in the other retry branch referenced (the block around
the immediate-exit at the other use of _get_status_code).
- Around line 51-57: LLMRetryConfig currently only exposes max_retries and
cancel_event, so the retry loop always raises RetryExhaustedError (e.g., in the
function that currently forces that at lines ~307-317) and there's no way to
enable overload-threshold fallback; add a fallback configuration to
LLMRetryConfig (for example a fallback_strategy enum or a boolean like
enable_overload_fallback plus optional fallback parameters such as
overload_threshold and fallback_backoff) and update the retry logic (the routine
that raises RetryExhaustedError) to consult this new config: when overload
conditions are met and fallback is enabled, execute the configured fallback
behavior instead of unconditionally raising RetryExhaustedError. Ensure
references to LLMRetryConfig, max_retries, cancel_event, and the retry
loop/RetryExhaustedError raise site are updated to use the new fields.
- Around line 37-48: _env var PUPPY_MAX_LLM_RETRIES parsed in
_resolve_max_retries currently allows negative or zero values which can lead to
zero-attempt execution and an invalid "retry-exhausted" state (last_error=None);
update _resolve_max_retries to coerce the parsed int to a safe minimum (e.g., if
parsed <= 0, log a warning and return _DEFAULT_MAX_RETRIES or 1) so the function
never returns <= 0, and audit any retry-loop logic that reads this value (areas
around where retries are consumed and last_error is checked) to ensure they rely
on a positive max_retries value.

---

Nitpick comments:
In `@code_puppy/agents/base_agent.py`:
- Around line 1909-1910: Remove the duplicate assignment to
pydantic_agent._toolsets: there are two identical lines setting
pydantic_agent._toolsets = original_toolsets + self._mcp_servers; keep a single
assignment and delete the redundant one so only one update to
pydantic_agent._toolsets remains (refer to pydantic_agent, _toolsets,
original_toolsets, and self._mcp_servers to locate the lines).

In `@tests/test_llm_retry.py`:
- Around line 385-399: Update test_retry_after_header_respected to assert header
precedence by injecting a custom _compute_backoff that would return a different
backoff than the Retry-After header so the test fails if header extraction
regresses: have the factory raise a 429 with a non-zero "retry-after" value,
call llm_run_with_retry(factory, config=LLMRetryConfig(max_retries=3),
_compute_backoff=<custom_callable>) where <custom_callable> returns a sentinel
backoff, and assert the retry used the header-derived value (e.g., by observing
call_count==2 and/or by verifying that the computed backoff came from the header
rather than the injected _compute_backoff). Ensure you reference
test_retry_after_header_respected, llm_run_with_retry, _compute_backoff, and
LLMRetryConfig when making the change.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ed9b088 and 77bec84.

📒 Files selected for processing (4)

code_puppy/agents/base_agent.py
code_puppy/callbacks.py
code_puppy/llm_retry.py
tests/test_llm_retry.py

coderabbitai · 2026-02-27T03:19:33Z

+@dataclass
+class LLMRetryConfig:
+    """Configuration for the LLM retry engine."""
+
+    max_retries: int = field(default_factory=_resolve_max_retries)
+    cancel_event: Optional[asyncio.Event] = None
+


⚠️ Potential issue | 🟠 Major

Consecutive-529 short-circuit cannot trigger fallback behavior.

Line 312 always raises RetryExhaustedError, and LLMRetryConfig (Line 55-Line 56) has no fallback configuration, so overload-threshold fallback cannot be enabled.

🧩 Suggested API extension

`@dataclass` class LLMRetryConfig: """Configuration for the LLM retry engine.""" max_retries: int = field(default_factory=_resolve_max_retries) cancel_event: Optional[asyncio.Event] = None + fallback_model: Optional[str] = None + max_consecutive_overloads: int = _MAX_CONSECUTIVE_OVERLOADS + + +class FallbackTriggeredError(Exception): + def __init__(self, fallback_model: str, original_error: Exception): + super().__init__(f"Switch to fallback model: {fallback_model}") + self.fallback_model = fallback_model + self.original_error = original_error @@ - if overload_hits >= _MAX_CONSECUTIVE_OVERLOADS: + if overload_hits >= config.max_consecutive_overloads: + if config.fallback_model: + raise FallbackTriggeredError( + config.fallback_model, error + ) from error raise RetryExhaustedError( - f"API returned {_MAX_CONSECUTIVE_OVERLOADS} consecutive " + f"API returned {config.max_consecutive_overloads} consecutive " f"overloaded errors", error, ) from error

Also applies to: 307-317

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@code_puppy/llm_retry.py` around lines 51 - 57, LLMRetryConfig currently only exposes max_retries and cancel_event, so the retry loop always raises RetryExhaustedError (e.g., in the function that currently forces that at lines ~307-317) and there's no way to enable overload-threshold fallback; add a fallback configuration to LLMRetryConfig (for example a fallback_strategy enum or a boolean like enable_overload_fallback plus optional fallback parameters such as overload_threshold and fallback_backoff) and update the retry logic (the routine that raises RetryExhaustedError) to consult this new config: when overload conditions are met and fallback is enabled, execute the configured fallback behavior instead of unconditionally raising RetryExhaustedError. Ensure references to LLMRetryConfig, max_retries, cancel_event, and the retry loop/RetryExhaustedError raise site are updated to use the new fields.

Intentionally not addressed — YAGNI. We originally implemented FallbackTriggeredError and fallback_model config, then deliberately removed them during code review. No fallback model infrastructure exists in Code Puppy today, and adding speculative plumbing for a feature that doesn't exist yet violates YAGNI. When model fallback is needed, it can be added as a focused follow-up with its own tests and config surface.

@nhicks00, understood! Thank you for the context. Applying YAGNI to avoid speculative plumbing for non-existent infrastructure is the right call—better to add the fallback config surface when the actual fallback model logic lands. I've noted this decision for future reference.

✏️ Learnings added

Learnt from: nhicks00 Repo: mpfaffenberger/code_puppy PR: 211 File: code_puppy/llm_retry.py:59-65 Timestamp: 2026-02-27T03:54:07.263Z Learning: In code_puppy/llm_retry.py, FallbackTriggeredError and fallback_model configuration were intentionally removed during review. No fallback model infrastructure exists in Code Puppy today, and adding speculative plumbing violates YAGNI. Fallback features should be added later as a focused follow-up when needed.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

coderabbitai · 2026-02-27T03:19:33Z

+    status = _get_status_code(error)
+    if status is None:
+        # No status code and not a recognized transient pattern — don't retry
+        # blindly.  Only specifically-identified patterns above are retried.
+        return False
+
+    if status == 408:
+        return True  # Request Timeout
+    if status == 409:
+        return True  # Conflict
+    if status == 429:
+        return True  # Rate Limit
+    if status == 401:
+        return True  # Unauthorized (token may need refresh)
+    if status >= 500:
+        return True  # Server errors
+
+    # 400 (non-overflow), 402, 403, 404, 422, etc. — fatal
+    return False


⚠️ Potential issue | 🟠 Major

Missing context-overflow recovery path for retryable 400 overflow errors.

Line 193 treats 400 as fatal, and Line 328 immediately exits. The required context-overflow handling (parse inputTokens/contextLimit, compute available tokens, set max_tokens=max(3000, available), retry immediately without sleep) is not implemented.

Also applies to: 327-333

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@code_puppy/llm_retry.py` around lines 176 - 194, The retry logic currently treats status 400 as fatal; update the function that uses _get_status_code (the is_retryable / should_retry path around _get_status_code(error)) to detect the specific context-overflow 400 response and implement the context-overflow recovery: when the error payload indicates inputTokens/contextLimit overflow, parse inputTokens and contextLimit from the error, compute available = contextLimit - inputTokens, set the retry request's max_tokens to max(3000, available) and flag this retry to happen immediately (no sleep/backoff), then requeue the request; leave other 400s unchanged (still fatal). Ensure this same handling is added where the code currently exits on 400 in the other retry branch referenced (the block around the immediate-exit at the other use of _get_status_code).

Intentionally not addressed — out of scope. Context-overflow recovery (detecting 400s caused by exceeding context window, then auto-reducing max_tokens or truncating prompts) is a new feature, not a bug in the retry engine. It would require token counting logic, model-specific context limits, and prompt truncation strategies — none of which belong in a retry PR. This would be a good candidate for a separate issue and focused follow-up PR.

@nhicks00, that's a reasonable architectural decision. You're right that context-overflow recovery involves significant complexity (token counting, model-specific context windows, prompt truncation strategies) that would expand the scope of this retry engine PR considerably.

While issue #210 did reference context-overflow handling, your core retry engine implementation successfully addresses the primary objective: robust exponential backoff and transient failure handling for the identified error classes. Separating the context-overflow feature into a focused follow-up PR will allow for proper design and testing of that specific capability.

Would you like me to open a follow-up issue to track the context-overflow recovery feature? I can draft it with the specific requirements (detect context-window 400s, parse token limits, compute available tokens, adjust max_tokens, immediate retry) and link it back to this PR and #210 for context.

- Clamp negative PUPPY_MAX_LLM_RETRIES values to default with warning - Tighten test_retry_after_header_respected to assert _compute_backoff receives the parsed header value (1.5), not just that retry occurred - Add test_config_negative_env test Addresses CodeRabbit review findings mpfaffenberger#1 and mpfaffenberger#5 on PR mpfaffenberger#211. CodeRabbit findings not addressed (by design): - mpfaffenberger#2 (FallbackTriggeredError): Intentionally removed during code review as YAGNI. No fallback model infrastructure exists today — adding speculative code for a feature that doesn't exist yet violates YAGNI. - mpfaffenberger#3 (context-overflow 400 auto-reduce max_tokens): Out of scope. This is a new feature requiring token counting, model-specific context limits, and prompt truncation — none of which belong in a retry engine PR. - mpfaffenberger#4 (duplicate _toolsets line in base_agent.py): Pre-existing issue not introduced by this PR. Fixing unrelated code here violates minimal-diff principles.

coderabbitai

♻️ Duplicate comments (2)

code_puppy/llm_retry.py (2)

281-287: ⚠️ Potential issue | 🟠 Major

Clamp/validate config.max_retries before the loop.

A negative value can still be provided directly via LLMRetryConfig(max_retries=...), which can skip all attempts and hit the tail RetryExhaustedError path with last_error=None.

Proposed hardening

 async def llm_run_with_retry(
     coro_factory: Callable[[], Any],
     config: Optional[LLMRetryConfig] = None,
 ) -> Any:
@@
     if config is None:
         config = LLMRetryConfig()
 
-    max_retries = config.max_retries
+    max_retries = max(0, config.max_retries)
     overload_hits = 0
     last_error: Optional[Exception] = None

Also applies to: 362-365

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@code_puppy/llm_retry.py` around lines 281 - 287, Clamp and validate
config.max_retries before using it in the retry loop: ensure the value read from
config.max_retries is converted to an int and clamped to a non-negative (e.g.,
max(0, int(...))) and optionally bounded by a sensible upper limit, then assign
that sanitized value to the local max_retries used by the loop (the variable set
just above the "for attempt in range(1, max_retries + 2):" line); do the same
sanitization wherever you read config.max_retries (also at the later usage
around lines 362-365) so the loop cannot be skipped due to a negative value and
last_error cannot remain None when RetryExhaustedError is raised.

120-125: ⚠️ Potential issue | 🟡 Minor

Normalize x-should-retry value parsing.

The check is case-sensitive and whitespace-sensitive, so authoritative header hints like "True" or " false " are ignored.

Proposed fix

 def _get_x_should_retry(error: Exception) -> Optional[bool]:
@@
-    val = headers.get("x-should-retry")
-    if val == "true":
+    val = headers.get("x-should-retry")
+    if val is None:
+        return None
+    normalized = str(val).strip().lower()
+    if normalized == "true":
         return True
-    if val == "false":
+    if normalized == "false":
         return False
     return None

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@code_puppy/llm_retry.py` around lines 120 - 125, The header parsing for
"x-should-retry" is currently case- and whitespace-sensitive; normalize the
retrieved value (the variable val from headers.get("x-should-retry")) by calling
strip() and lower() before comparison so values like " True " or "FALSE" are
handled correctly, then return True if normalized == "true", False if normalized
== "false", otherwise return None.

🧹 Nitpick comments (1)

code_puppy/llm_retry.py (1)

90-109: Support HTTP-date Retry-After values for full RFC compatibility.

Right now only numeric seconds are parsed. If the server sends a date-form Retry-After, this path falls back to computed backoff instead of honoring the header.

Proposed refactor

+from datetime import datetime, timezone
+from email.utils import parsedate_to_datetime
@@
 def _get_retry_after(error: Exception) -> Optional[float]:
@@
     try:
         return float(val)
     except (ValueError, TypeError):
-        return None
+        try:
+            dt = parsedate_to_datetime(str(val))
+            if dt.tzinfo is None:
+                dt = dt.replace(tzinfo=timezone.utc)
+            delta = (dt - datetime.now(timezone.utc)).total_seconds()
+            return delta if delta > 0 else None
+        except Exception:
+            return None

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@code_puppy/llm_retry.py` around lines 90 - 109, The _get_retry_after function
currently only parses numeric seconds; update it to also accept HTTP-date
strings by, after the numeric float attempt, trying to parse the header value
with a RFC-compatible parser (e.g., email.utils.parsedate_to_datetime) to
produce a timezone-aware datetime, compute seconds = (parsed_datetime -
datetime.now(timezone.utc)).total_seconds() and return that seconds value if
positive (or 0), otherwise None; ensure you handle parsing exceptions and
non-aware datetimes by normalizing to UTC and return None on failure. Reference:
function name _get_retry_after, local variable val and headers retrieval logic.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@code_puppy/llm_retry.py`:
- Around line 281-287: Clamp and validate config.max_retries before using it in
the retry loop: ensure the value read from config.max_retries is converted to an
int and clamped to a non-negative (e.g., max(0, int(...))) and optionally
bounded by a sensible upper limit, then assign that sanitized value to the local
max_retries used by the loop (the variable set just above the "for attempt in
range(1, max_retries + 2):" line); do the same sanitization wherever you read
config.max_retries (also at the later usage around lines 362-365) so the loop
cannot be skipped due to a negative value and last_error cannot remain None when
RetryExhaustedError is raised.
- Around line 120-125: The header parsing for "x-should-retry" is currently
case- and whitespace-sensitive; normalize the retrieved value (the variable val
from headers.get("x-should-retry")) by calling strip() and lower() before
comparison so values like " True " or "FALSE" are handled correctly, then return
True if normalized == "true", False if normalized == "false", otherwise return
None.

---

Nitpick comments:
In `@code_puppy/llm_retry.py`:
- Around line 90-109: The _get_retry_after function currently only parses
numeric seconds; update it to also accept HTTP-date strings by, after the
numeric float attempt, trying to parse the header value with a RFC-compatible
parser (e.g., email.utils.parsedate_to_datetime) to produce a timezone-aware
datetime, compute seconds = (parsed_datetime -
datetime.now(timezone.utc)).total_seconds() and return that seconds value if
positive (or 0), otherwise None; ensure you handle parsing exceptions and
non-aware datetimes by normalizing to UTC and return None on failure. Reference:
function name _get_retry_after, local variable val and headers retrieval logic.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 77bec84 and b4a5c17.

📒 Files selected for processing (2)

code_puppy/llm_retry.py
tests/test_llm_retry.py

🚧 Files skipped from review as they are similar to previous changes (1)

tests/test_llm_retry.py

nhicks00 · 2026-02-27T03:54:09Z

Response to second CodeRabbit review (`b4a5c17`)

1. Clamp config.max_retries in llm_run_with_retry() for direct constructor use:
Acknowledged as a valid defense-in-depth point. However, LLMRetryConfig is an internal API — not user-facing input. The env var path (the only user-facing entry point) is now guarded. A developer passing LLMRetryConfig(max_retries=-5) directly is a programming error, not a runtime input validation concern. If the maintainer prefers the extra max(0, ...) clamp, happy to add it — but it's not a correctness bug since the loop still terminates safely with negative values.

2. Normalize x-should-retry header (case/whitespace):
Not addressed — the Anthropic API sends exactly "true" or "false" per their SDK. Adding .strip().lower() normalization for hypothetical non-standard servers is speculative. If a real server sends "True" or " false ", we can add normalization then.

3. Support HTTP-date Retry-After values:
Not addressed — LLM APIs (Anthropic, OpenAI, Google) send numeric seconds exclusively. Adding RFC 7231 HTTP-date parsing with datetime imports is over-engineering for a case that doesn't occur in practice with any known LLM provider.

coderabbitai bot reviewed Feb 27, 2026

View reviewed changes

nhicks00 requested a review from mpfaffenberger February 27, 2026 03:58

Conversation

nhicks00 commented Feb 27, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What it does

Walmart Fork Sync Notes

Design Note: Why core, not a plugin

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nhicks00 Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nhicks00 Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

nhicks00 commented Feb 27, 2026

Response to second CodeRabbit review (b4a5c17)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nhicks00 commented Feb 27, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 27, 2026 •

edited

Loading

coderabbitai bot Feb 27, 2026 •

edited

Loading

coderabbitai bot Feb 27, 2026 •

edited

Loading

Response to second CodeRabbit review (`b4a5c17`)