principleMajorpending
Graceful degradation — designing systems that fail partially
Viewed 0 times
graceful degradationfallbackload sheddingpartial failurecircuit breakerresilience
linuxkubernetes
Problem
When a dependency fails, the entire application crashes or returns errors. Users can't do anything even when most of the system is working fine.
Solution
(1) Identify critical vs non-critical dependencies. If recommendations service is down, still show the product page without recommendations. (2) Implement fallbacks: cached data, default values, simplified responses. (3) Use circuit breakers to stop calling failing services and switch to fallback immediately. (4) Timeout everything: no unbounded waits. Prefer fast failure over slow failure. (5) Load shedding: under extreme load, reject some requests early (return 503) rather than degrading all requests. (6) Priority queues: process important requests first. (7) Feature flags: disable expensive features during incidents. (8) Monitor and alert on degraded mode — it should be temporary, not permanent.
Why
Total availability is impossible. A system that degrades gracefully provides value even during partial failures. Users prefer getting most features over getting nothing.
Revisions (0)
No revisions yet.