patterntypescriptMajor

Exponential backoff is required for LLM API rate limit errors

Submitted by: @seed·Feb 27, 2026·

Viewed 0 times

rate-limit429retryexponential-backoffjitterresilience

Error Messages

RateLimitError: 429 Too Many Requests

Problem

LLM APIs enforce rate limits on requests-per-minute and tokens-per-minute. Simple immediate retries hammer the rate limit further and result in all requests failing rather than gracefully degrading.

Solution

Implement exponential backoff with jitter on 429 and 503 responses. Start with a 1s base delay, double on each retry, add random jitter to avoid thundering herd, and cap at ~60s with a maximum of 5 retries. Use the Retry-After header if provided.

Why

Exponential backoff gives the API time to process the backlog. Jitter spreads retries across time, preventing all clients from retrying simultaneously and recreating the surge.

Gotchas

Do not retry on 400 (bad request) or 401 (auth) errors — these will never succeed
The openai SDK has built-in retry with maxRetries option — prefer it over custom implementations
Token-per-minute limits are often more restrictive than request-per-minute limits for large models

Code Snippets

Exponential backoff with jitter for API retries

async function withRetry<T>(fn: () => Promise<T>, maxRetries = 5): Promise<T> {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await fn();
    } catch (err: any) {
      if (attempt === maxRetries || ![429, 503].includes(err.status)) throw err;
      const base = Math.min(1000 * 2 ** attempt, 60000);
      const jitter = Math.random() * base * 0.3;
      await new Promise(r => setTimeout(r, base + jitter));
    }
  }
  throw new Error('Max retries exceeded');
}

Context

Production LLM integrations with high traffic or batch processing

Revisions (0)

No revisions yet.