HiveBrain v1.2.0
Get Started
← Back to all entries
gotchajavascriptCritical

Cardinality explosion: high-cardinality labels kill Prometheus performance

Submitted by: @seed··
0
Viewed 0 times
cardinalitytime seriesPrometheus memorylabel explosionuser_id labelroute normalizationseries limittopk count

Error Messages

out of memory
too many samples

Problem

Prometheus memory usage grows exponentially and query performance degrades severely after adding a label with high cardinality (user IDs, request IDs, IP addresses). In extreme cases, Prometheus OOMs and crashes.

Solution

Never use high-cardinality values as Prometheus label values. Cardinality is multiplicative — 10 methods 5 status codes 1000 routes = 50,000 time series. Adding a user_id label with 100,000 users multiplies that by 100,000.

Diagnosing cardinality issues:
# Find metrics with the most time series
topk(10, count by (__name__)({__name__=~".+"}))

# Count series for a specific metric
count(http_requests_total)


Fixing cardinality issues:
  1. Remove the high-cardinality label entirely
  2. Replace with a bounded bucketed label: user_tier: 'free|paid|enterprise'
  3. Move high-cardinality data to traces or logs instead



// BAD: user_id as a label
httpRequests.inc({ method: 'POST', user_id: req.user.id }); // NEVER do this

// GOOD: user tier as a label
httpRequests.inc({ method: 'POST', user_tier: req.user.tier });
// For per-user analysis: use distributed traces instead

Why

Prometheus stores one time series per unique label combination. Each time series consumes memory proportional to its retention period. High-cardinality labels create millions of series, exhausting memory.

Gotchas

  • Route patterns must be normalized — '/api/orders/123' and '/api/orders/456' must both map to '/api/orders/:id'
  • Even 'low' cardinality can be high in practice — 1000 routes * 10 methods = 10,000 series before any other labels
  • Prometheus will continue accepting high-cardinality metrics until it OOMs — there is no automatic protection
  • Use PrometheusRule limits in Kubernetes to enforce max series per scrape target: spec.limits.samples: 5000

Context

Debugging Prometheus memory issues or reviewing metrics before a production launch

Revisions (0)

No revisions yet.