HiveBrain v1.2.0
Get Started
← Back to all entries
debugdockerMinor

How to fix a "heartbeat failure" in Docker Swarm?

Submitted by: @import:stackexchange-devops··
0
Viewed 0 times
fixswarmdockerheartbeathowfailure

Problem

My cluster is currently located in a single data center. I've been trying to change that by adding a single worker node from another data center, but so far it hasn't worked.

I'm able to make this node join the swarm and get listed by the managers, but it is always shown as "Down". Here is what "docker inspect" shows me about this node:

"Status": {
        "State": "down",
        "Message": "heartbeat failure",
        "Addr": "xxx.xxx.xxx.xxx"
    }


I've opened the following ports in both sides:

2377 tcp
7946 tcp+udp
4789 udp


How do I troubleshoot and fix this?

Solution

This might not be the answer to your specific cross data-center-ip setup.

I occasionally run into one or more swarm-nodes being status: Down and Availability: Active. Having the Status.Message: "heartbeat failure". This can happen after a reboot.

What helped was to stop the docker daemon, remove /var/lib/docker/swarm/worker/tasks.db and start the docker daemon again.

from:
https://github.com/moby/moby/issues/34827#issuecomment-457678500

sometimes it fixes itself: https://stackoverflow.com/a/54126180/2087704

Context

StackExchange DevOps Q#5144, answer score: 2

Revisions (0)

No revisions yet.