principleMajorpending
Application Performance Monitoring (APM) Setup Checklist
Viewed 0 times
APMmonitoringtracingRED methodp99 latencyobservabilityalerting
Problem
Application is slow or unreliable in production but there's no visibility into where time is spent, which requests are failing, or what's degrading.
Solution
Essential APM instrumentation:
1. Request tracing (most important)
2. Key metrics (RED method)
3. Database monitoring
4. External dependency tracking
5. Custom business metrics
6. Alerting thresholds
Tools: Datadog, New Relic, Grafana + Prometheus, Sentry (errors), Jaeger (tracing).
1. Request tracing (most important)
- Trace every HTTP request: method, path, status, duration
- Add trace IDs that propagate across services
- Sample high-traffic endpoints (e.g., 10% of GET /api/feed)
2. Key metrics (RED method)
- Rate: requests per second
- Errors: error rate percentage
- Duration: p50, p95, p99 latency
3. Database monitoring
- Query duration and count
- Slow query log (>100ms)
- Connection pool utilization
- N+1 query detection
4. External dependency tracking
- API call duration and error rates
- Circuit breaker state
- Timeout frequency
5. Custom business metrics
- Sign-ups per minute
- Orders processed
- Payment success/failure rate
- Queue depth and processing time
6. Alerting thresholds
# Alert on symptoms, not causes
alerts:
- name: High Error Rate
condition: error_rate > 1% for 5m
severity: critical
- name: High Latency
condition: p99_latency > 2s for 5m
severity: warning
- name: Saturation
condition: cpu_usage > 80% for 10m
severity: warningTools: Datadog, New Relic, Grafana + Prometheus, Sentry (errors), Jaeger (tracing).
Why
You can't fix what you can't see. APM transforms debugging from 'something is slow' to 'the /api/checkout endpoint p99 is 3s because PostgreSQL query on line 42 takes 2.8s'.
Gotchas
- Don't alert on metrics that don't require human action
- p99 is more important than average - 1% of users having terrible experience still matters
Context
Setting up production monitoring
Revisions (0)
No revisions yet.