gotchasqlMinor
Why does most Galera cluster documentation recommend having at least 3 nodes as the typical set up?
Viewed 0 times
typicalwhynodesthesethavingdocumentationdoesleastrecommend
Problem
The Galera documentation and most experts implementing this type of setup, recommend having at least 3 nodes in the Galera cluster. What is the reasoning behind this?
Is a cluster setup with only two nodes essentially flawed?
Is a cluster setup with only two nodes essentially flawed?
Solution
If you want real HA ("High Availability"), you need 3 nodes in 3 separate locations.
When one machine goes down, the rest of the nodes vote and see that they are in the majority. They see 2 out of 3, so they declare themselves to be the cluster and their sibling to be out of the cluster. If the dead machine comes back online, the majority chatter with the reanimated one and decide how to repair it. All of this is automagic.
Let's take a slight variant of that -- 3 machines, but a network outage splits one away from the two. The two, as above, declare them still in the cluster. The one, not having a majority, realizes that he should not be the cluster. So, it plays dead (or something).
What about only 2 nodes? -- If one dies, or is isolated, the other one does not have a majority, so it plays dead. You have no cluster, and you are hosed.
2 + garbd -- This is where you have 2 machines with all the data, plus a third, little "arbitrator" machine. The garbd pretends to have the data, but is really empty. He votes, so any single machine outage can keep the cluster alive.
Why did I say "separate locations"? What if you have a 3-node cluster with all 3 in the same rack in the same building on the same flood plain on the same earthquake fault on the path of a tornado, etc? Some disaster is very likely to take out all 3 nodes. No HA.
I have a few more comments and tips on Galera in http://mysql.rjweb.org/doc.php/galera
When one machine goes down, the rest of the nodes vote and see that they are in the majority. They see 2 out of 3, so they declare themselves to be the cluster and their sibling to be out of the cluster. If the dead machine comes back online, the majority chatter with the reanimated one and decide how to repair it. All of this is automagic.
Let's take a slight variant of that -- 3 machines, but a network outage splits one away from the two. The two, as above, declare them still in the cluster. The one, not having a majority, realizes that he should not be the cluster. So, it plays dead (or something).
What about only 2 nodes? -- If one dies, or is isolated, the other one does not have a majority, so it plays dead. You have no cluster, and you are hosed.
2 + garbd -- This is where you have 2 machines with all the data, plus a third, little "arbitrator" machine. The garbd pretends to have the data, but is really empty. He votes, so any single machine outage can keep the cluster alive.
Why did I say "separate locations"? What if you have a 3-node cluster with all 3 in the same rack in the same building on the same flood plain on the same earthquake fault on the path of a tornado, etc? Some disaster is very likely to take out all 3 nodes. No HA.
I have a few more comments and tips on Galera in http://mysql.rjweb.org/doc.php/galera
Context
StackExchange Database Administrators Q#93266, answer score: 6
Revisions (0)
No revisions yet.