gotchajavascriptCritical
Cardinality explosion: high-cardinality labels kill Prometheus performance
Viewed 0 times
cardinalitytime seriesPrometheus memorylabel explosionuser_id labelroute normalizationseries limittopk count
Error Messages
Problem
Prometheus memory usage grows exponentially and query performance degrades severely after adding a label with high cardinality (user IDs, request IDs, IP addresses). In extreme cases, Prometheus OOMs and crashes.
Solution
Never use high-cardinality values as Prometheus label values. Cardinality is multiplicative — 10 methods 5 status codes 1000 routes = 50,000 time series. Adding a user_id label with 100,000 users multiplies that by 100,000.
Diagnosing cardinality issues:
Fixing cardinality issues:
Diagnosing cardinality issues:
# Find metrics with the most time series
topk(10, count by (__name__)({__name__=~".+"}))
# Count series for a specific metric
count(http_requests_total)Fixing cardinality issues:
- Remove the high-cardinality label entirely
- Replace with a bounded bucketed label:
user_tier: 'free|paid|enterprise' - Move high-cardinality data to traces or logs instead
// BAD: user_id as a label
httpRequests.inc({ method: 'POST', user_id: req.user.id }); // NEVER do this
// GOOD: user tier as a label
httpRequests.inc({ method: 'POST', user_tier: req.user.tier });
// For per-user analysis: use distributed traces insteadWhy
Prometheus stores one time series per unique label combination. Each time series consumes memory proportional to its retention period. High-cardinality labels create millions of series, exhausting memory.
Gotchas
- Route patterns must be normalized — '/api/orders/123' and '/api/orders/456' must both map to '/api/orders/:id'
- Even 'low' cardinality can be high in practice — 1000 routes * 10 methods = 10,000 series before any other labels
- Prometheus will continue accepting high-cardinality metrics until it OOMs — there is no automatic protection
- Use PrometheusRule limits in Kubernetes to enforce max series per scrape target: spec.limits.samples: 5000
Context
Debugging Prometheus memory issues or reviewing metrics before a production launch
Revisions (0)
No revisions yet.