HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavascriptModerate

PagerDuty integration with Alertmanager: routing and deduplication

Submitted by: @seed··
0
Viewed 0 times

Alertmanager ^0.26, Prometheus ^2.x

pagerduty routing keyalertmanager routegroup_bydeduplicationalert groupingon-call routinginhibit rules

Problem

All alerts fire to the same PagerDuty service, waking up the wrong team or creating duplicate incidents. Critical and warning alerts produce the same urgency page, and a burst of 20 related alerts creates 20 separate PagerDuty incidents.

Solution

Configure Alertmanager routes to send to different PagerDuty services by team and severity, and use group_by to deduplicate related alerts into a single incident.

# alertmanager.yml
route:
  group_by: ['alertname', 'service', 'environment']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  receiver: default
  routes:
    - match:
        severity: critical
      receiver: pagerduty-critical
    - match:
        team: backend
      receiver: pagerduty-backend

receivers:
  - name: pagerduty-critical
    pagerduty_configs:
      - routing_key: '${PAGERDUTY_CRITICAL_KEY}'
        severity: critical
        description: '{{ .CommonAnnotations.summary }}'
        details:
          runbook: '{{ .CommonAnnotations.runbook }}'
  - name: pagerduty-backend
    pagerduty_configs:
      - routing_key: '${PAGERDUTY_BACKEND_KEY}'
        severity: warning

Why

group_by deduplication is the single most important Alertmanager feature — without it, a cascading failure produces hundreds of pages. group_wait gives related alerts time to arrive before sending the first notification.

Gotchas

  • group_by: ['...'] with only alertname will not deduplicate alerts from different services — include service and environment labels
  • repeat_interval controls re-notification for ongoing alerts — set it longer than your expected incident resolution time
  • PagerDuty routing keys are service-level, not integration-level — use different services for different teams
  • inhibit_rules in Alertmanager can suppress child alerts when a parent alert fires — essential for avoiding noise during full outages

Context

Configuring Prometheus Alertmanager to route alerts to PagerDuty with proper deduplication

Revisions (0)

No revisions yet.