patternsqlMajor
MySQL replication - slave is continuously lagging behind master
Viewed 0 times
slavecontinuouslylaggingreplicationmysqlmasterbehind
Problem
I am using MySQL-5.1.50 with a Master-slave replication setup.
Most of the time the slave is lagging behind the master.
When I run
The slave is continuously giving alerts that replication is seconds behind the master. Sometimes, the lag time increases.
How do I diagnose the cause of the problem?
I need urgent help, since this problem has persisted for the last 20 days.
Most of the time the slave is lagging behind the master.
When I run
show processlist;, there is no query that's taking a long time. I enabled slow_log as well. However, it does not find any slow running query.The slave is continuously giving alerts that replication is seconds behind the master. Sometimes, the lag time increases.
How do I diagnose the cause of the problem?
I need urgent help, since this problem has persisted for the last 20 days.
Solution
The Seconds_Behind_Master is really like viewing the past via time travel.
Think of it this way:
In like manner, it seems that the Master is processing a lot of queries at the same time.
You look back at the Slave, run
There is another metric to look at besides
For example, suppose you have the following scenario on the Master:
When the Slave reads those queries from its relay log and processes them one by one
Concerning the Slow Log, the default for long_query_time is 10 seconds. If all your queries in the relay logs are less than 10 seconds, you will never catch anything in the Slow Query Log.
I have the following recommendations for both Master and Slave servers
FURTHER TROUBLESHOOTING
If you want to see the queries causing the replciation lag, do the following:
For example, Let's do
If I run
Dump the contents as follows:
You can now see the queries the Slave is currently
Think of it this way:
- The Sun is 93,000,000 miles away from the Earth
- The speed of light is 186,000 miles/sec
- Simple division shows that it takes roughly 500 sec (8 min 20 sec) for the Sun's light to reach Earth
- When you look at the Sun, you actually don't see the Sun. You see where it was 8 min 20 sec ago.
In like manner, it seems that the Master is processing a lot of queries at the same time.
You look back at the Slave, run
SHOW SLAVE STATUS\G and it says 200 for Seconds_Behind_Master. How is that number calculated? Slave's Clock Time (UNIX_TIMESTAMP(NOW()) - TIMESTAMP of the Query when it was completed and recorded in the Master's Binary Log.There is another metric to look at besides
Seconds_Behind_Master. That metric is called Relay_Log_Space. That represents the sum of all bytes for all relay files on the Slave. By default, the largest single relay log is limited to 1GB. If Relay_Log_Space is less than 1GB, this indicates that many long running queries executed on the Master in parallel. Unfortunately, due to the single-threaded nature Replication's SQL thread, queries are executed one behind the other.For example, suppose you have the following scenario on the Master:
- Slow Query log is enabled
- 20 queries executed in parallel on the Master
- Each query took 3 seconds
- Each query get recorded in the Master Binary Log with the same timestamp
When the Slave reads those queries from its relay log and processes them one by one
- the Slave's Clock will be moving
- the TIMESTAMP for each of the 20 queries will be identical
- the difference will increasing 3 seconds be completed query
- this results in 60 seconds for
Seconds_Behind_Master
Concerning the Slow Log, the default for long_query_time is 10 seconds. If all your queries in the relay logs are less than 10 seconds, you will never catch anything in the Slow Query Log.
I have the following recommendations for both Master and Slave servers
- RECOMMENDATION #1 : Upgrade to MySQL 5.5. Under MySQL 5.5 and Percona Server 5.1.38+, you can tune InnoDB to access multiple CPUs. I have written past posts about this
Apr 26, 2012: Is the CPU performance relevant for a database server?
Sep 20, 2011: Multi cores and MySQL Performance
Sep 12, 2011: Possible to make MySQL use more than one core?
May 26, 2011: About single threaded versus multithreaded databases performance
- RECOMMENDATION #2 : Use InnoDB for all tables. InnoDB caches data and indexes in RAM, MyISAM only caches indexes.
- RECOMMENDATION #3 : Increase RAM. You must cache more data and indexes on the Slave and Master alike
- RECOMMENDATION #4 : Tune all queries. Reducing milliseconds from queries that run hundreds of times goes a long way to reduce
Seconds_Behind_Master.
FURTHER TROUBLESHOOTING
If you want to see the queries causing the replciation lag, do the following:
SHOW SLAVE STATUS\G
- Get name of relay log from
Relay_Log_File
STOP SLAVE;
START SLAVE;
- In the OS,
cd /var/lib/mysqlor wherever the relay logs are written
- Dump the relay log to a text file
For example, Let's do
SHOW SLAVE STATUS\GSlave_IO_State: Waiting for master to send event
Master_Host: 10.64.51.149
Master_User: replicant
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000009
Read_Master_Log_Pos: 1024035856
Relay_Log_File: relay-bin.000030
Relay_Log_Pos: 794732078
Relay_Master_Log_File: mysql-bin.000009
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB: search_cache
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 1024035856
Relay_Log_Space: 794732271
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 106451149If I run
STOP SLAVE; START SLAVE;, the relay log closes and a new one is open. Yet, you want relay-bin.000030.Dump the contents as follows:
cd /var/lib/mysql
mysqlbinlog relay-bin.000030 > /root/RelayLogQueries.txt
less /root/RelayLogQueries.txtYou can now see the queries the Slave is currently
Code Snippets
Slave_IO_State: Waiting for master to send event
Master_Host: 10.64.51.149
Master_User: replicant
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000009
Read_Master_Log_Pos: 1024035856
Relay_Log_File: relay-bin.000030
Relay_Log_Pos: 794732078
Relay_Master_Log_File: mysql-bin.000009
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB: search_cache
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 1024035856
Relay_Log_Space: 794732271
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 106451149cd /var/lib/mysql
mysqlbinlog relay-bin.000030 > /root/RelayLogQueries.txt
less /root/RelayLogQueries.txtContext
StackExchange Database Administrators Q#24793, answer score: 21
Revisions (0)
No revisions yet.