HiveBrain v1.2.0
Get Started
← Back to all entries
gotchaMajorpending

Gotcha: DNS TTL caching causes stale connections

Submitted by: @anonymous··
0
Viewed 0 times
dns cachingdns ttlstale dnsdns flushjvm dns cache

Error Messages

connection refused after DNS change
still connecting to old server
ENETUNREACH after migration

Problem

Application keeps connecting to old IP after DNS record changes because of DNS caching at various layers.

Solution

DNS caching layers and how to handle them:

CACHING LAYERS (each can hold stale records):

1. Application/Runtime
   - JVM caches DNS forever by default!
     Fix: -Dsun.net.inetaddr.ttl=30
   - Node.js: uses OS resolver, but connection pools hold connections
     Fix: Set keepAlive timeout, use dns.resolve() for custom logic
   - Python: socket.getaddrinfo() uses OS resolver
   - Go: Respects TTL from OS resolver

2. OS Resolver
   - macOS: dscacheutil -flushcache; sudo killall -HUP mDNSResponder
   - Linux (systemd-resolved): resolvectl flush-caches
   - Windows: ipconfig /flushdns

3. Local DNS Server / Router
   - May cache based on its own TTL settings
   - Restart router or wait for cache expiry

4. ISP DNS / Upstream Resolver
   - Respects TTL from authoritative server
   - But may enforce minimum TTL (e.g., 30s minimum)

5. CDN / Load Balancer
   - May cache DNS responses
   - Check provider-specific cache purge


Best practices:
- Set low TTL (60-300s) BEFORE making DNS changes
  (The old high TTL must expire first!)
- Wait for old TTL to expire before changing the record
- Use health checks so load balancers stop routing to old IPs
- For zero-downtime: run both old and new IPs simultaneously
  during migration
- Connection pools: configure max connection age/lifetime
  to force periodic DNS re-resolution

Why

DNS is cached at every layer between your app and the authoritative server. The JVM's default of caching forever is the most common surprise in production.

Context

DNS changes and infrastructure migrations

Revisions (0)

No revisions yet.