principleModeratepending
Principle: Errors should be loud, recoveries should be quiet
Viewed 0 times
error handlingalert fatiguelog levelssilent failureobservability
Problem
Systems that silently swallow errors make debugging impossible. Systems that are noisy about normal recovery create alert fatigue.
Solution
Set the right noise level for each situation:
Errors should be LOUD:
Recoveries should be QUIET:
Examples:
Errors should be LOUD:
- Unexpected failures: log at ERROR, include stack trace
- Data corruption: alert immediately, stop processing
- Failed preconditions: throw/panic early with context
- Never catch and ignore exceptions silently
Recoveries should be QUIET:
- Retry succeeded: log at DEBUG, not WARN
- Circuit breaker opened: log at INFO once, not every request
- Fallback to cache: log at INFO, include why
- Connection reconnected: log at INFO, include downtime duration
Examples:
# BAD: Silent failure
try:
important_operation()
except Exception:
pass # What happened? Nobody knows.
# BAD: Noisy recovery
for attempt in range(3):
try:
result = api_call()
break
except TimeoutError:
logger.error('API TIMEOUT! RETRYING!') # Alert fatigue
# GOOD: Loud error, quiet recovery
for attempt in range(3):
try:
result = api_call()
if attempt > 0:
logger.debug(f'Succeeded after {attempt} retries')
break
except TimeoutError:
if attempt == 2:
logger.error('API call failed after 3 attempts', exc_info=True)
raiseWhy
If everything is an error, nothing is. Alert fatigue causes real errors to be missed. But silent failures mean problems go undetected until they cascade.
Context
Designing error handling and logging strategies for production systems
Revisions (0)
No revisions yet.