patternsqlMinor
MySQL replication lagging behind with log_slave_updates
Viewed 0 times
log_slave_updateswithlaggingreplicationmysqlbehind
Problem
I'm in the process of setting up replication in this topology:
In order to achieve this I need log_slave_updates to be enabled on C, so D will also get the changes from A.
C is a pretty beefy server. Amazon EC2-based. 32 cores, 250GB memory and 20000 IOPS data volume. It's more powerful than A, which has less CPU power, memory and 3200 IOPS on its main disk.
As soon as I turn on
I don't know where to start debugging this.
Edit Thanks for your follow-up questions, I added a bit more information here:
I did indeed create the EBS volume from a snapshot, but I warmed it up by reading every byte using
Output of
This is the output of
```
Slave_IO_State: Waiting for master to send event
Master_Host: anonimized
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.001253
Read_Master_Log_Pos: 969363410
Relay_Log_File: mysql-relay-bin.002984
Relay_Log_Pos: 44865262
Relay_Master_Log_File: mysql-bin.001253
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 910276312
Relay_Log_Space: 103952567
Until_Condi
/--B
A-
\--C--DIn order to achieve this I need log_slave_updates to be enabled on C, so D will also get the changes from A.
C is a pretty beefy server. Amazon EC2-based. 32 cores, 250GB memory and 20000 IOPS data volume. It's more powerful than A, which has less CPU power, memory and 3200 IOPS on its main disk.
As soon as I turn on
log_slave_updates on C, it almost immediately falls behind A. This is odd to me because I would expect it to easily outperform, and in my mind log_slave_updates is just writing its log to disk.I don't know where to start debugging this.
Edit Thanks for your follow-up questions, I added a bit more information here:
I did indeed create the EBS volume from a snapshot, but I warmed it up by reading every byte using
dd this took a few days.Output of
SELECT @@INNODB_FLUSH_LOG_AT_TRX_COMMIT, @@SYNC_BINLOG\G is:@@INNODB_FLUSH_LOG_AT_TRX_COMMIT: 2
@@SYNC_BINLOG: 1This is the output of
SHOW SLAVE STATUS a few minutes after turning log_slave_updates on again:```
Slave_IO_State: Waiting for master to send event
Master_Host: anonimized
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.001253
Read_Master_Log_Pos: 969363410
Relay_Log_File: mysql-relay-bin.002984
Relay_Log_Pos: 44865262
Relay_Master_Log_File: mysql-bin.001253
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 910276312
Relay_Log_Space: 103952567
Until_Condi
Solution
First of all, you need to rely on
Let's take a sample display from an old answer of mine : MySQL Replication Lag Behaving Erraticly. In the display, please note the items that can help you sense measure replication
Here are the items
Lag can only be measured, or at least noticed, by observing one or more of the following:
I sometimes see lag increasing because of taking a large mysqldump and loading it hours later (even a day later). When starting up replication for the first time, the apparent numerical lag in Seconds would disappear after several hours (or even a day) out of nowhere.
Please note that there is only one SQL thread for replication. All queries are processed sequentially. How does this affect replication lag ???
Suppose the Master executed 300 inserts, updates, and deletes in the same second and each query took 1 second to run. Here is what is actually happening:
I have discussed MySQL Replication Lag many times over the years.
Even with the beefiest scaled-up Slave, it will only process one SQL transaction at a time. A beefy Master would make sense if you eventually failover to such server and promote it to a Master.
In conclusion, please keep your eye on these things as the Slave eventually catches up.
UPDATE 2017-01-07 17:21 EST
Here is what I find disturbing about turning on log_slave_updates
Relay Logs and Binary Logs are sequentially written. The tablespace files for InnoDB tables (
Perhaps if you store your Relay Logs and Binary Logs on a volume separate from the data, then the sequential writes on those logs will increase in speed.
In the DBA StackExchange, I referred to a FaceBook Engineer eight(8) times who splits up data fr
SHOW SLAVE STATUS\GLet's take a sample display from an old answer of mine : MySQL Replication Lag Behaving Erraticly. In the display, please note the items that can help you sense measure replication
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.17.20.102
Master_User: replicant
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.002814
Read_Master_Log_Pos: 823078734
Relay_Log_File: relay-bin.007364
Relay_Log_Pos: 823078879
Relay_Master_Log_File: mysql-bin.002814
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 823078734
Relay_Log_Space: 823079071
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
1 row in set (0.00 sec)Here are the items
Seconds_Behind_Master: The number of seconds behind the slave is from the master based on the time in the Slave's OS minus the TIMESTAMP of the latest query processed on the Slave.
Relay_Log_Space: Total number of bytes collected on the Slave from the Master yet to be processed
- (
Master_Log_File,Read_Master_Log_Pos) : The latest binlog entry from the Master to be recorded in the Slave's Relay Logs.
- (
Relay_Master_Log_File,Exec_Master_Log_Pos) : The latest binlog entry from the Master to be executed in the Slave.
Lag can only be measured, or at least noticed, by observing one or more of the following:
Seconds_Behind_Masterkeeps increasing
Master_Log_FileandRelay_Master_Log_Fileare not the same (binlog events are being collected faster that the SQL thread can process them)
Exec_Master_Log_Posis not moving even ifSlave_SQL_Running: Yes(This could be a long running query or possibly a very large transaction commit)
Relay_Log_Spaceis increasing steadily
- Indicates amount of binlog events collected
- This could mean slow collection of binlog events over the network
I sometimes see lag increasing because of taking a large mysqldump and loading it hours later (even a day later). When starting up replication for the first time, the apparent numerical lag in Seconds would disappear after several hours (or even a day) out of nowhere.
Please note that there is only one SQL thread for replication. All queries are processed sequentially. How does this affect replication lag ???
Suppose the Master executed 300 inserts, updates, and deletes in the same second and each query took 1 second to run. Here is what is actually happening:
- All queries are serialized when written into the Master's binary logs
- The Slave reads the binlog events from the Master ONE AT A TIME !!!
- The Slave executes the binlog events it recorded ONE AT A TIME !!!
Seconds_Behind_Masterwould actually increase by 300 seconds (5 minutes)
- Keep in mind that the result of transaction could be huge and cannot be applied until all the binlog events making up the one transaction is completely written into the relay logs on the Master (This can usually occur with ROW based replication). This also contributes to increased lag.
I have discussed MySQL Replication Lag many times over the years.
Even with the beefiest scaled-up Slave, it will only process one SQL transaction at a time. A beefy Master would make sense if you eventually failover to such server and promote it to a Master.
In conclusion, please keep your eye on these things as the Slave eventually catches up.
UPDATE 2017-01-07 17:21 EST
Here is what I find disturbing about turning on log_slave_updates
Relay Logs and Binary Logs are sequentially written. The tablespace files for InnoDB tables (
.ibd) experience random writes. If sequential and random writes occur on the same disk, you can very sure the random writes will become the bottleneck of sequential writes.Perhaps if you store your Relay Logs and Binary Logs on a volume separate from the data, then the sequential writes on those logs will increase in speed.
In the DBA StackExchange, I referred to a FaceBook Engineer eight(8) times who splits up data fr
Code Snippets
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.17.20.102
Master_User: replicant
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.002814
Read_Master_Log_Pos: 823078734
Relay_Log_File: relay-bin.007364
Relay_Log_Pos: 823078879
Relay_Master_Log_File: mysql-bin.002814
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 823078734
Relay_Log_Space: 823079071
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
1 row in set (0.00 sec)Context
StackExchange Database Administrators Q#160207, answer score: 2
Revisions (0)
No revisions yet.