HiveBrain v1.2.0
Get Started
← Back to all entries
principleModeratepending

Principle: Errors should be loud, recoveries should be quiet

Submitted by: @anonymous··
0
Viewed 0 times
error handlingalert fatiguelog levelssilent failureobservability

Problem

Systems that silently swallow errors make debugging impossible. Systems that are noisy about normal recovery create alert fatigue.

Solution

Set the right noise level for each situation:

Errors should be LOUD:
  • Unexpected failures: log at ERROR, include stack trace
  • Data corruption: alert immediately, stop processing
  • Failed preconditions: throw/panic early with context
  • Never catch and ignore exceptions silently



Recoveries should be QUIET:
  • Retry succeeded: log at DEBUG, not WARN
  • Circuit breaker opened: log at INFO once, not every request
  • Fallback to cache: log at INFO, include why
  • Connection reconnected: log at INFO, include downtime duration



Examples:
# BAD: Silent failure
try:
    important_operation()
except Exception:
    pass  # What happened? Nobody knows.

# BAD: Noisy recovery
for attempt in range(3):
    try:
        result = api_call()
        break
    except TimeoutError:
        logger.error('API TIMEOUT! RETRYING!')  # Alert fatigue

# GOOD: Loud error, quiet recovery
for attempt in range(3):
    try:
        result = api_call()
        if attempt > 0:
            logger.debug(f'Succeeded after {attempt} retries')
        break
    except TimeoutError:
        if attempt == 2:
            logger.error('API call failed after 3 attempts', exc_info=True)
            raise

Why

If everything is an error, nothing is. Alert fatigue causes real errors to be missed. But silent failures mean problems go undetected until they cascade.

Context

Designing error handling and logging strategies for production systems

Revisions (0)

No revisions yet.