patternsqlMinor

MySQL Replication and High Availability

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

highreplicationmysqlavailabilityand

Problem

I know this might not be a very good question to be asked in this site, but I really need some helps/hints to be directed to the right path.

I want to have failover strategy for 2 MySQL servers which are in Master-Master Replication mode. I have done the replication, and it's working just fine. Yet, when in comes to the failover strategy, I'm kind of clueless...

I've been reading on the internet over the past couple of days (I know this is a very complex procedure, and I could not master it in a week or so), but the more I read the more confuse I get. Basically, as I understood, I need to have a software to monitor the network and report any failure and another one to switch between two systems. Heartbeat apparently is the best in doing so, but in all tutorials I read, they are using many other software ( like MON for monitoring, or other software for dumping or getting backups) as well. Is it necessary to have them? Can one have fail-over strategy only using MySQL and HA tool like Heartbeat?

Each website is introducing different tool and ways, which they have their own pros and cons, and of course it depends on how I want my "cluster" to work, and how much high availability is important to my system. I am just considering pointing it out to my customers, and it's not going to the production phase any soon, since my customers running small businesses and don't care about loosing data that much, however, I wanted to dig into the concept by setting up a simple master-master replication where in case of failure of any servers, clients automatically commit changes to the standby server.

Solution

Given the fact that you mentioned you have Master-Master replication mode, I would not recommend any automatic failover unless you properly account for Replication Lag. After all, MySQL Replication is asynchronous.

It is theoretically possible to have the following:

DBServer1 as Master to DBServer2

DBServer2 as Master to DBServer1

DBVIP pointing at DBServer1

DBServer2 is 180 seconds behind

DBServer1 goes down

Automatic Failover moves DBVIP to DBServer2

With this scenario, DBServer2 could have auto increment keys that do not exist yet. Upon failover, the DBVIP will allow WebServers to connect to DBServer2 and ask for data that does not exist yet.

This would therefore require background processes running on each DBServer.

For the above scenario:

DBVIP is on DBServer1

DBServer1 runs HeartBeat

DBServer2 runs HeartBeat

Background Process on DBServer1 to monitor

a) Data Mount Availability

b) Data Mount Writeability

c) MySQL Connectivity

Once a,b, or c fail, kill HeartBeat

Background Process on DBServer2 to make sure DBVIP is pingable

What should killing HeartBeat do? Trigger the startup script defined for it.

What should the startup script on DBServer2 look for?

Loop until DBVIP is unreachable via ping

Connect to MySQL and

Run SHOW SLAVE STATUS\G in a Loop until Seconds_Behind_Master is NULL

RUn SHOW SLAVE STATUS\G in a Loop until Exec_Master_Log_Pos stops changing

Assign DBVIP to DBServer2 via ip addr add




This is essentailly the algorithm for failing over safely to a Passive Master in a Master/Master Replication Cluster.

ALTERNATIVE

If ALL your data is InnoDB, I recommend something with less rigor. Perhaps you should look into using DRBD and HeartBeat. Here is why:

DRBD provides network RAID-1 for a Block Device on two servers.

You would essentially do this:

Have DBServer1's DRBD Block Device as Primary

Have DBServer2's DRBD Block Device as Secondary

Mount DBServer1's DRBD Device on /var/lib/mysql

Startup MySQL on DBServer1




Have HeartBeat Monitor Ping Activity Between Servers



What would startup script look like in a DRBD scenario?

Loop until DBVIP is unreachable via ping



Kill HeartBeart



Disconnect DRBD



Promote DRBD to Primary



Mount DRBD on /var/lib/mysql



Start MySQL (InnoDB Crash Recovery Fills in Missing Data)

Assign DBVIP via ip addr add




This is a lot more straightforward because only one side is Active. The Passive side (DRBD Secondary) is a Synchronous Disk Copy of the Active Side (DRBD Primary).

CAVEAT

If all or most of the working set data is MyISAM, do not touch DRBD. Crash scnearios quickly result in MyISAM tables being marked crashed and need auto-repair (which can be paintfully slow to wait for).

UPDATE 2012-12-29 08:00 EDT

Here are my past posts on using DRBD with MySQL

Mar 29, 2011 : MySQL high availability, failover and replication with Latency

Aug 29, 2011 : MySQL Replication : 1 Slave / Multiple Masters

Dec 19, 2011 : Best way to setup master to multi master replication

Jul 25, 2012` : Mysql database replication on different vlan/subnet/another site (I stirred a pretty big pot on this one in someone else's blog)

Context

StackExchange Database Administrators Q#31031, answer score: 4

Revisions (0)

No revisions yet.