gotchayamlkubernetesCriticalpending
Kubernetes Health Checks: Liveness vs Readiness vs Startup Probes
Viewed 0 times
livenessreadinessstartup probehealth checkkubernetescascading failure
Error Messages
Problem
Misconfigured Kubernetes probes cause cascading failures: liveness probes restart healthy but slow pods, readiness probes send traffic to unready pods, or startup probes time out for slow-starting apps.
Solution
Each probe serves a different purpose:
Critical rules:
# Startup probe: is the app started?
# Only runs at startup. Prevents liveness from killing slow starters.
startupProbe:
httpGet:
path: /healthz
port: 8080
failureThreshold: 30 # 30 * 10s = 5 minutes to start
periodSeconds: 10
# Liveness probe: is the app stuck?
# Restarts the container if it fails.
# ONLY checks if the process is alive, NOT dependencies.
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 0 # Startup probe handles delay
periodSeconds: 15
timeoutSeconds: 3
failureThreshold: 3
# Readiness probe: can the app handle traffic?
# Removes from service endpoints if it fails.
# CAN check dependencies (DB, cache).
readinessProbe:
httpGet:
path: /ready
port: 8080
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3Critical rules:
- Liveness must NOT check dependencies (DB down -> all pods restart -> cascading failure)
- Readiness CAN check dependencies (DB down -> stop sending traffic -> graceful degradation)
- Use startup probes for slow starters (Java, .NET) instead of large initialDelaySeconds
- Liveness endpoint should be cheap - no DB queries, just return 200
Why
Kubernetes uses probes to make automatic decisions. Wrong probe config means K8s makes wrong decisions: restarting healthy pods, sending traffic to broken ones, or killing slow-starting apps.
Gotchas
- Liveness checking DB health = one DB hiccup restarts ALL pods simultaneously
- initialDelaySeconds on liveness is a fixed wait - startup probe is adaptive and better
Context
Configuring Kubernetes health checks correctly
Revisions (0)
No revisions yet.