patternjavascriptModerate
Uptime monitoring: external black-box checks with Uptime Kuma or Pingdom
Viewed 0 times
Uptime Kuma ^1.23
uptime monitoringexternal probeblack-box monitoringUptime Kumacertificate expirykeyword checkpush monitorcron job monitoring
Problem
Internal Prometheus metrics show the service as healthy but users report they cannot access the application. Internal monitoring has a blind spot: it cannot detect DNS failures, TLS certificate expiry, CDN issues, or problems that only manifest from outside the network.
Solution
Implement external uptime monitoring that probes your endpoints from outside your infrastructure.
Uptime Kuma (self-hosted, free):
What to monitor:
For production: use multiple probe locations (US, EU, APAC) to detect regional failures.
Uptime Kuma (self-hosted, free):
- HTTP/HTTPS checks with keyword assertion
- TCP port checks
- DNS resolution checks
- Push-based monitoring (service calls Uptime Kuma, useful for background jobs)
- Status page built-in
What to monitor:
- Primary user-facing URL with keyword check (verifies not just 200 OK but actual content)
- API health endpoint from external probe
- Certificate expiry check (alert at 30 days before expiry)
- DNS resolution for all CNAME/A records
# Uptime Kuma push monitor URL (for background jobs / cron)
curl "https://uptime.example.com/api/push/abc123?status=up&msg=OK&ping="For production: use multiple probe locations (US, EU, APAC) to detect regional failures.
Why
External probes experience the same path as users: public DNS, CDN edge, load balancer, TLS termination. This catches failures that internal monitoring is structurally unable to detect.
Gotchas
- A single probe location can false-positive if the probe location has a network issue — use 3+ locations
- Keyword checks are more valuable than status code checks — a 200 with 'Service Unavailable' in the body will fool a status-code-only check
- Certificate checks should alert at 30 days and again at 7 days — don't wait for expiry
- Push monitors for cron jobs alert when the job stops running, not when it fails — combine with process exit code checks
Context
Adding external availability monitoring to catch issues that internal metrics miss
Revisions (0)
No revisions yet.