patternModeratepending
Debugging production issues with limited access — techniques and tools
Viewed 0 times
production debuggingstructured logserror trackingSentrycanary deployfeature flags
linuxkubernetes
Problem
Bug only reproduces in production. Cannot attach a debugger. Logs are insufficient. Adding more logging requires a deploy. Need to diagnose without disrupting the service.
Solution
(1) Structured logs: if you followed structured logging, filter and query logs by request ID, user ID, error type. (2) Distributed tracing: trace the specific failing request through all services. (3) Feature flags: add behind-flag verbose logging that can be enabled per-user or per-request without deploying. (4) Log sampling: increase log level for a percentage of requests temporarily. (5) Error tracking (Sentry, Bugsnag): captures stack traces, request data, and breadcrumbs automatically. (6) Replay: if you log request/response pairs, replay the request locally. (7) Core dumps / heap dumps: configure to auto-capture on crash, analyze offline. (8) Canary deploys: deploy a fix to a subset of traffic first.
Why
Production debugging is constrained by access, risk, and time pressure. The tools you can use are limited to what you set up before the incident. Invest in observability before you need it.
Revisions (0)
No revisions yet.