HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Monitoring replication on mysql

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
mysqlreplicationmonitoring

Problem

I have managed to setup master and slave replication.

It is working fine. What are the possibilities that it might go down?

Is there any alerting tool to monitor that?

Another thing: Can I run a separate db in my replication db which I just run for testing purposes?

Solution

Replication can break or misbehave in all sorts of fun and exciting ways. You need to monitor for three things:

  1. Replication is running and has not stopped due to error



To monitor whether or not replication is running is simply a matter of programmatically checking SHOW SLAVE STATUS and looking at the values for Slave_IO_Running and Slave_SQL_Running. Both should be "yes". pmp-check-mysql-replication-running from the Percona Monitoring Plugins for Nagios is written for this task.

  1. Replication is performing well (slave lag behind master is within an acceptable range)



You need to make sure that the slave has not lagged behind the master by too far. "Too far" is determined by what your application can tolerate and by how many binary logs you keep on the master server. Because replication on the slave is single-threaded, slaves can easily get lagged behind. SHOW SLAVE STATUS has the Seconds_Behind_Master value, but is not a reliable indicator of actual lag, and frequently will jump around. In order to accurately measure replication lag, you need an external application to insert a timestamp into a table periodically. You can then measure that value from the slave and compare it against the current time to get actual replication delay. pt-heartbeat is a daemon that will insert a heartbeat into a table on your server. You can then alert on that value with pmp-check-mysql-replication-delay to make sure it is within your specified parameters.

  1. The data on the servers is in sync.



There are many ways that a master and slave can get out of sync so that the data differs. You need to detect those differences and correct them periodically because a small difference can, over time, turn into a very large difference, especially with statement-based replication. This is no small task, and pt-table-checksum is designed to calculate these differences. Run this weekly. pmp-check-pt-table-checksum is a Nagios plugin to alert when the slave has data discrepancies relative to the master. To actually fix the differences, use pt-table-sync.

pt-table-checksum has been recently rewritten and is pretty easy to use. pt-table-sync has a lot of options and can be confusing. Read the documentation for these thoroughly as you can really shoot yourself in the foot if you aren't careful. Here is a webinar about these tools.


Another thing: Can I run a separate db in my replication db which I just run for testing purposes?

There is nothing preventing you from modifying (or supplementing) the data on the slave, though generally I would recommend against it. Best practice is to have the slave be read_only=1. However, real life tends to trump best practices and often slaves are used as reporting servers. My suggestion would be to make very clear access privileges for those using the slave for data modification and to have all additional tables in a separate schema.

Context

StackExchange Database Administrators Q#17501, answer score: 9

Revisions (0)

No revisions yet.