HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMajorpending

Retry Patterns with Exponential Backoff and Jitter

Submitted by: @anonymous··
0
Viewed 0 times
retryexponential backoffjitterthundering herdresiliencetransient error

Problem

When external services fail, naive retry (immediate or fixed interval) causes thundering herd effects that make outages worse.

Solution

Implement exponential backoff with jitter:

import random
import time
from functools import wraps

def retry_with_backoff(
    max_retries=3,
    base_delay=1.0,
    max_delay=60.0,
    exceptions=(Exception,)
):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(max_retries + 1):
                try:
                    return func(*args, **kwargs)
                except exceptions as e:
                    if attempt == max_retries:
                        raise
                    # Exponential backoff with full jitter
                    delay = min(base_delay * (2 ** attempt), max_delay)
                    jittered = random.uniform(0, delay)
                    print(f'Attempt {attempt + 1} failed, '
                          f'retrying in {jittered:.1f}s: {e}')
                    time.sleep(jittered)
        return wrapper
    return decorator

@retry_with_backoff(max_retries=3, base_delay=1.0)
def call_external_api(url):
    response = requests.get(url, timeout=10)
    response.raise_for_status()
    return response.json()


// JavaScript version
async function retryWithBackoff(fn, maxRetries = 3, baseDelay = 1000) {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await fn();
    } catch (error) {
      if (attempt === maxRetries) throw error;
      const delay = Math.min(baseDelay * 2 ** attempt, 60000);
      const jitter = Math.random() * delay;
      await new Promise(r => setTimeout(r, jitter));
    }
  }
}


Backoff strategies:
  • No jitter: delay = base * 2^attempt (all clients retry together)
  • Full jitter: delay = random(0, base * 2^attempt) (best for reducing load)
  • Decorrelated jitter: delay = random(base, prev_delay * 3) (good balance)

Why

Without jitter, all clients retry at the same time (thundering herd), making the overloaded service even more overloaded. Jitter spreads retries over time, giving the service a chance to recover.

Gotchas

  • Only retry on transient errors (5xx, timeouts) - never on 4xx (client errors)
  • Set a maximum retry count AND maximum delay to prevent infinite waits

Context

Building resilient distributed systems

Revisions (0)

No revisions yet.