Skip to content

feat: fetcher reads retry_after from body when Cloudflare returns retry-after:0#148

Closed
kribor wants to merge 1 commit into
mainfrom
feat/fetcher-cloudflare-retry-after-body
Closed

feat: fetcher reads retry_after from body when Cloudflare returns retry-after:0#148
kribor wants to merge 1 commit into
mainfrom
feat/fetcher-cloudflare-retry-after-body

Conversation

@kribor

@kribor kribor commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Cloudflare 429 responses set retry-after: 0 in the HTTP header but encode the actual delay (e.g. 30 s) in the JSON body as retry_after
  • Previously, a header value of 0 fell through to exponential backoff, causing premature retries
  • Now getRetryTimeout checks res.body.retry_after for any 429/503 where the header-based timeout resolves to ≤ 0

Test plan

  • New test retryAfter cloudflare body covers the Cloudflare pattern: retry-after: 0 header + { retry_after: 2 } in body
  • Existing retryAfter and retryAfter date tests continue to pass

…ry-after:0

Cloudflare 429 responses set retry-after:0 in the header but encode the
actual delay (e.g. 30 s) in the JSON body as retry_after. The header
value parses to 0, which previously fell through to exponential backoff
and caused premature retries. Now we check res.body.retry_after for any
429/503 where the header-based timeout is ≤ 0.
@kribor kribor requested a review from kirillgroshkov as a code owner June 13, 2026 15:19
@kirillgroshkov

Copy link
Copy Markdown
Member

Review: reading retry_after from the response body

Thanks for digging into this — the underlying retry-on-429 frustration is real. But I'd push back on landing this in the core Fetcher, because the mechanism it adds is non-standard, and the "Cloudflare does this" justification doesn't hold up against Cloudflare's own docs. I'd rather solve the real case with a per-project hook and (optionally) a small standards-compliant fix to the header parsing. Details below.

What is standard

The spec-defined "retry later" channel is the Retry-After HTTP header (RFC 9110 §10.2.3, originally RFC 6585 for 429). It's either delay-seconds or an HTTP-date — and getRetryTimeout already handles both, plus the x-ratelimit-reset fallback (a widely-adopted de-facto convention, now being formalized as the IETF RateLimit/RateLimit-Policy header draft). That's all squarely standard.

Note the standard semantics of Retry-After: 0: it means "retry immediately" — it's a legitimate value, not a sentinel for "ignore me and look elsewhere."

Why reading retry_after from the body is non-standard

There is no RFC — and not even the IETF rate-limit draft — that defines a retry delay in the response body. Every standardized mechanism is header-based. A retry_after field in the JSON body is purely a per-vendor application convention. The canonical example is Discord's API, whose 429 body looks like { "message": "You are being rate limited.", "retry_after": 64.57, "global": false }. That's where this pattern comes from — not from HTTP, and not from Cloudflare.

The Cloudflare premise doesn't match Cloudflare's docs

The PR is built on "Cloudflare sets retry-after: 0 in the header but encodes the real delay in the body." Cloudflare's own documentation says the opposite:

  • On a 429, Cloudflare returns Retry-After = "number of seconds until more capacity is available, rounded up" — i.e. the real value goes in the header, and "all of Cloudflare's latest SDKs automatically respond to the headers." (changelog, Error 429 docs)
  • Where Cloudflare's API does expose a retry_after body field, the docs state the Retry-After header value matches the body field — they're consistent, not header=0 / body=real.

So the retry-after: 0 + body-only-delay scenario is not documented Cloudflare behavior. Most likely it's a specific origin server sitting behind Cloudflare that emits a Discord-style body — which makes it that one API's convention, not a Cloudflare standard we should bake into the shared client.

A latent bug this actually surfaces (worth a separate, standard-only fix)

While verifying the premise I found the description's "a header value of 0 fell through to exponential backoff" isn't quite what happens today. Trace getRetryTimeout with retry-after: 0:

  1. Number('0')0 (falsy) → goes to the else/date branch
  2. new Date('0')not Invalid Date; V8 parses it as year 2000
  3. timeout = Number(date) - Date.now() → a large negative number (~ -8.3e11)
  4. if (!timeout) is false (negative is truthy), so it never falls through to the backoff — getRetryTimeout returns the negative value → pDelay(negative) → retry near-immediately

So retry-after: 0 currently means "retry now" by accident (which, ironically, is the standard meaning). The robust, standards-only fix is to treat any non-positive / unparseable header result as "no usable value" and fall through to the normal backoff:

// getRetryTimeout — change the fallback guard from `if (!timeout)` to:
if (!(timeout > 0)) {
  // covers 0, NaN, and the negative value `new Date('0')` produces
  const noise = Math.random() * 500
  timeout = res.retryStatus.retryTimeout + noise
}

This stays 100% header-based/standard, fixes the retry-after: 0 edge, and — importantly — is what makes the hook approach below actually take effect.

Recommended approach: a per-project onBeforeRetry hook

The fetcher already ships the exact extension point for this. onBeforeRetry is documented as "Allows to mutate res.retryStatus to override retry behavior" (fetcher.model.ts), and by the time it runs res.body is already populated (onNotOkResponse does res.body = _jsonParseIfPossible(...)). So the project that talks to the Discord/Cloudflare-fronted API can opt in locally without changing the shared client:

fetcher.onBeforeRetry(res => {
  const status = res.fetchResponse?.status
  if (status !== 429 && status !== 503) return

  // Only step in when the standard header gave us nothing usable
  const header = res.fetchResponse?.headers.get('retry-after')
  if (header && Number(header) > 0) return

  const bodyRetryAfter = (res.body as any)?.retry_after
  if (typeof bodyRetryAfter !== 'number' || bodyRetryAfter <= 0) return

  // NOTE: retryStatus.retryTimeout is multiplied by `timeoutMultiplier` and clamped to
  // `timeoutMax` *after* this hook, so divide to land on the value the server asked for,
  // and make sure timeoutMax is high enough if the delay can exceed it (default 30s).
  res.retryStatus.retryTimeout = (bodyRetryAfter * 1000) / res.req.retry.timeoutMultiplier
})

(Requires the fall-through fix above so retry-after: 0 doesn't short-circuit to the negative header value.)

Bottom line

  • Don't merge the body-parsing into core — it couples the shared client to a vendor wire format (Discord-style), adds a body as any cast, and is justified by Cloudflare behavior that Cloudflare's docs contradict.
  • If you want a core change, take just the standard-only if (!(timeout > 0)) fall-through fix — that's a genuine, spec-aligned improvement.
  • Handle the body-retry_after quirk in the consuming project via onBeforeRetry, where the non-standard assumption is scoped to the API that actually needs it.

Sources: MDN/RFC — Retry-After · Cloudflare rate-limiting headers changelog · Cloudflare Error 429 · Discord rate limits

@kirillgroshkov

Copy link
Copy Markdown
Member

The above is Claude's review, after I asked it to check, as it all sounded like a super non-standard behavior. My intention is to not support non-standard things in Fetcher (in our open source libs in general).

Instead, proposal is to add a custom hook that will incorporate this custom retry behavior to where it's used (script in NCB3).

@kribor

kribor commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

Ok, makes sense. In my claude session it sounded like this was well known and consistent cloudflare behavior and caused by technical details in different layers setting headers for body response: WAF sets retry-after: 0 because it doesn't yet know what the real value should be, so that is returned in the body instead.

I opened the PR thinking that Cloudflare is so big part of internet that if it's consistent across cloudflare it could make sense to support in fetcher but since it doesn't appear to be the case, makes sense to keep in the hook

@kribor kribor closed this Jun 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants