patternsqlMinor
With MySQL Replication, what level of resilience is possible?
Viewed 0 times
whatwithlevelreplicationmysqlpossibleresilience
Problem
I've found lots of good material on setting up MySQL replication, but not much on what to do in the event of failures. Understanding the degree of resilience I can achieve is fundamental to designing my system, so I'm not here looking for "howto" administrative advice, rather I want to understand the achievable semantics.
So, to simplify my requirements, I have an application that cannot function unless it can read a database. There are two scenarios: first, the common usage, many requests a minute, read the database return an answer. Second, less freaquently, update the database with new data. It's acceptble for the updates to be a few minutes delayed.
So my first thought: Master->Slave
Now the reader can use either Master or Slave, if we lose the Master for a while the reader can work against the Slave.
Sounds simple enough. But ... what about more drastic problems, how manual is recovery? How long does it take? What data is lost?
Take this scenario: Master->Slave. We know that the slave is potantially a little out of synch with the Master. Now suppose we lose the master in a way that means it won't be back any time soon.
Now presumably we need the Slave to become the writable Master, and we'll need a new Slave.
Specific Questions:
So, to simplify my requirements, I have an application that cannot function unless it can read a database. There are two scenarios: first, the common usage, many requests a minute, read the database return an answer. Second, less freaquently, update the database with new data. It's acceptble for the updates to be a few minutes delayed.
So my first thought: Master->Slave
Now the reader can use either Master or Slave, if we lose the Master for a while the reader can work against the Slave.
Sounds simple enough. But ... what about more drastic problems, how manual is recovery? How long does it take? What data is lost?
Take this scenario: Master->Slave. We know that the slave is potantially a little out of synch with the Master. Now suppose we lose the master in a way that means it won't be back any time soon.
Now presumably we need the Slave to become the writable Master, and we'll need a new Slave.
Specific Questions:
- How much time and effort to make the Slave the Master - I failed to find docs on what to do. I'm guessing that this is pretty easy. Can we make this take-over seamless to client apps? Adjust DNS routing or some such?
- If we can't now get at the old Master's logs, then we have to accept that some updates to the master will never make it to the new Master, we do have data loss?
- How much effort to create the new Slave? My guess here is that this is not difficult but potentially takes time. I was trying to imagine reducing this overhead by having two Slaves, and adjusting the replication so that when Slave 1 becomes the new Master, Slave 2 now becomes the slave of that new Master. However given the potential delays in re
Solution
A lot of the questions you are asking are predicated by use of master-slave replication. Life's a lot simpler if you use master-master replication; master-slave requires detection of a failure on the master and promotion of the slave. While this can be automated to some extent, you also need to think about how you implement reinstating the master.
Regarding data-loss: get over it. Although there are various approaches implemented in different products, even moving transaction control outside of the database cluster, there is still a possibility of losing data. The only practical solution is to design your system so that its possible to identify failed operations (which is good practice as such losses are more often caused by software bugs than system crashes).
Regarding the details of how you implement the system - a lot depends on the OS you are running on, the impact of downtime, the architecture of the application, the nauture of the network connecting the nodes in the application....lots of information which you've not provided.
There are some useful links here
Regarding data-loss: get over it. Although there are various approaches implemented in different products, even moving transaction control outside of the database cluster, there is still a possibility of losing data. The only practical solution is to design your system so that its possible to identify failed operations (which is good practice as such losses are more often caused by software bugs than system crashes).
Regarding the details of how you implement the system - a lot depends on the OS you are running on, the impact of downtime, the architecture of the application, the nauture of the network connecting the nodes in the application....lots of information which you've not provided.
There are some useful links here
Context
StackExchange Database Administrators Q#4761, answer score: 2
Revisions (0)
No revisions yet.