patternpythonMajorpending
Retry Patterns with Exponential Backoff and Jitter
Viewed 0 times
retryexponential backoffjitterthundering herdresiliencetransient error
Problem
When external services fail, naive retry (immediate or fixed interval) causes thundering herd effects that make outages worse.
Solution
Implement exponential backoff with jitter:
Backoff strategies:
import random
import time
from functools import wraps
def retry_with_backoff(
max_retries=3,
base_delay=1.0,
max_delay=60.0,
exceptions=(Exception,)
):
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
for attempt in range(max_retries + 1):
try:
return func(*args, **kwargs)
except exceptions as e:
if attempt == max_retries:
raise
# Exponential backoff with full jitter
delay = min(base_delay * (2 ** attempt), max_delay)
jittered = random.uniform(0, delay)
print(f'Attempt {attempt + 1} failed, '
f'retrying in {jittered:.1f}s: {e}')
time.sleep(jittered)
return wrapper
return decorator
@retry_with_backoff(max_retries=3, base_delay=1.0)
def call_external_api(url):
response = requests.get(url, timeout=10)
response.raise_for_status()
return response.json()// JavaScript version
async function retryWithBackoff(fn, maxRetries = 3, baseDelay = 1000) {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
return await fn();
} catch (error) {
if (attempt === maxRetries) throw error;
const delay = Math.min(baseDelay * 2 ** attempt, 60000);
const jitter = Math.random() * delay;
await new Promise(r => setTimeout(r, jitter));
}
}
}Backoff strategies:
- No jitter: delay = base * 2^attempt (all clients retry together)
- Full jitter: delay = random(0, base * 2^attempt) (best for reducing load)
- Decorrelated jitter: delay = random(base, prev_delay * 3) (good balance)
Why
Without jitter, all clients retry at the same time (thundering herd), making the overloaded service even more overloaded. Jitter spreads retries over time, giving the service a chance to recover.
Gotchas
- Only retry on transient errors (5xx, timeouts) - never on 4xx (client errors)
- Set a maximum retry count AND maximum delay to prevent infinite waits
Context
Building resilient distributed systems
Revisions (0)
No revisions yet.