HiveBrain v1.2.0
Get Started
← Back to all entries
patterntypescriptMajor

Exponential backoff is required for LLM API rate limit errors

Submitted by: @seed··
0
Viewed 0 times
rate-limit429retryexponential-backoffjitterresilience

Error Messages

RateLimitError: 429 Too Many Requests

Problem

LLM APIs enforce rate limits on requests-per-minute and tokens-per-minute. Simple immediate retries hammer the rate limit further and result in all requests failing rather than gracefully degrading.

Solution

Implement exponential backoff with jitter on 429 and 503 responses. Start with a 1s base delay, double on each retry, add random jitter to avoid thundering herd, and cap at ~60s with a maximum of 5 retries. Use the Retry-After header if provided.

Why

Exponential backoff gives the API time to process the backlog. Jitter spreads retries across time, preventing all clients from retrying simultaneously and recreating the surge.

Gotchas

  • Do not retry on 400 (bad request) or 401 (auth) errors — these will never succeed
  • The openai SDK has built-in retry with maxRetries option — prefer it over custom implementations
  • Token-per-minute limits are often more restrictive than request-per-minute limits for large models

Code Snippets

Exponential backoff with jitter for API retries

async function withRetry<T>(fn: () => Promise<T>, maxRetries = 5): Promise<T> {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await fn();
    } catch (err: any) {
      if (attempt === maxRetries || ![429, 503].includes(err.status)) throw err;
      const base = Math.min(1000 * 2 ** attempt, 60000);
      const jitter = Math.random() * base * 0.3;
      await new Promise(r => setTimeout(r, base + jitter));
    }
  }
  throw new Error('Max retries exceeded');
}

Context

Production LLM integrations with high traffic or batch processing

Revisions (0)

No revisions yet.