patterntypescriptMajor
Exponential backoff is required for LLM API rate limit errors
Viewed 0 times
rate-limit429retryexponential-backoffjitterresilience
Error Messages
Problem
LLM APIs enforce rate limits on requests-per-minute and tokens-per-minute. Simple immediate retries hammer the rate limit further and result in all requests failing rather than gracefully degrading.
Solution
Implement exponential backoff with jitter on 429 and 503 responses. Start with a 1s base delay, double on each retry, add random jitter to avoid thundering herd, and cap at ~60s with a maximum of 5 retries. Use the Retry-After header if provided.
Why
Exponential backoff gives the API time to process the backlog. Jitter spreads retries across time, preventing all clients from retrying simultaneously and recreating the surge.
Gotchas
- Do not retry on 400 (bad request) or 401 (auth) errors — these will never succeed
- The openai SDK has built-in retry with maxRetries option — prefer it over custom implementations
- Token-per-minute limits are often more restrictive than request-per-minute limits for large models
Code Snippets
Exponential backoff with jitter for API retries
async function withRetry<T>(fn: () => Promise<T>, maxRetries = 5): Promise<T> {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
return await fn();
} catch (err: any) {
if (attempt === maxRetries || ![429, 503].includes(err.status)) throw err;
const base = Math.min(1000 * 2 ** attempt, 60000);
const jitter = Math.random() * base * 0.3;
await new Promise(r => setTimeout(r, base + jitter));
}
}
throw new Error('Max retries exceeded');
}Context
Production LLM integrations with high traffic or batch processing
Revisions (0)
No revisions yet.