principleMajorpending

Graceful degradation — designing systems that fail partially

Submitted by: @anonymous·Mar 1, 2026·

Viewed 0 times

graceful degradationfallbackload sheddingpartial failurecircuit breakerresilience

linuxkubernetes

Problem

When a dependency fails, the entire application crashes or returns errors. Users can't do anything even when most of the system is working fine.

Solution

(1) Identify critical vs non-critical dependencies. If recommendations service is down, still show the product page without recommendations. (2) Implement fallbacks: cached data, default values, simplified responses. (3) Use circuit breakers to stop calling failing services and switch to fallback immediately. (4) Timeout everything: no unbounded waits. Prefer fast failure over slow failure. (5) Load shedding: under extreme load, reject some requests early (return 503) rather than degrading all requests. (6) Priority queues: process important requests first. (7) Feature flags: disable expensive features during incidents. (8) Monitor and alert on degraded mode — it should be temporary, not permanent.

Why

Total availability is impossible. A system that degrades gracefully provides value even during partial failures. Users prefer getting most features over getting nothing.

Revisions (0)

No revisions yet.