debugsqlMinor
SQL_SLAVE_SKIP_COUNTER = 1 fails, setting @@gtid_slave_pos used to skip a given GTID position
Viewed 0 times
sql_slave_skip_countergtidfailsusedpositiongtid_slave_posskipsettinggiven
Problem
I recently broke replication and when I tried to get past the one incorrect transaction. I got the following.
What do I need to do to fix this.
Update 1
```
MariaDB [(none)]> show variables like '%gtid%';
+------------------------+------------------------------------------+
| Variable_name | Value |
+------------------------+------------------------------------------+
| gtid_binlog_pos | 1-1050-4820789,2-1051-379101,3-1010-3273 |
| gtid_binlog_state | 1-1050-4820789,2-1051-379101,3-1010-3273 |
| gtid_current_pos | 1-1050-4819948,2-1051-379
MariaDB [(none)]> STOP SLAVE;
Query OK, 0 rows affected (0.05 sec)
MariaDB [(none)]> SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;
ERROR 1966 (HY000): When using parallel replication and GTID with multiple replication domains, @@sql_slave_skip_counter cannot be used. Instead, setting @@gtid_slave_pos explicitly can be used to skip to after a given GTID position.
MariaDB [(none)]> select @@gtid_slave_pos;
+---------------------------------------------+
| @@gtid_slave_pos |
+---------------------------------------------+
| 0-1051-1391406,1-1050-1182069,57-1051-98897 |
+---------------------------------------------+
1 row in set (0.00 sec)
MariaDB [(none)]> show variables like '%_pos%';
+----------------------+---------------------------------------------------------+
| Variable_name | Value |
+----------------------+---------------------------------------------------------+
| gtid_binlog_pos | 0-1051-1391406,2-1051-4474,57-1051-98897 |
| gtid_current_pos | 0-1051-1391406,1-1050-1182069,2-1051-4474,57-1051-98897 |
| gtid_slave_pos | 0-1051-1391406,1-1050-1182069,57-1051-98897 |
| wsrep_start_position | 00000000-0000-0000-0000-000000000000:-1 |
+----------------------+---------------------------------------------------------+What do I need to do to fix this.
Update 1
```
MariaDB [(none)]> show variables like '%gtid%';
+------------------------+------------------------------------------+
| Variable_name | Value |
+------------------------+------------------------------------------+
| gtid_binlog_pos | 1-1050-4820789,2-1051-379101,3-1010-3273 |
| gtid_binlog_state | 1-1050-4820789,2-1051-379101,3-1010-3273 |
| gtid_current_pos | 1-1050-4819948,2-1051-379
Solution
I found the following worked for me. This does not restore a slave into state that is an exact replica of master. There will be data differences.
I will use pt-table-sync to fix those.
2. Stop Parallel slave threads
3. Enable GTID replication
4. Using percona-toolkit pt-slave-restart to skip past all the errors.
Using master binglog position
This is well documented, Please google and find instructions.
This was part of the problem as seen in the original question.
I want to be able to skip events and not worry about trying to figure out or increase the GTID position for everyone.
Now if I check Parallel slave threads I see
I can reverse this process to re-enable Parallel slave threads when I am done. And I know that GTID is working.
I can now try restarting the slave with GTID enabled.
On the master
On the slave
Now when I check the slave it has some events to skip to get back into the same state as the master.
```
sudo yum install http://www.percona.com/download
I will use pt-table-sync to fix those.
- Restart Replication without GTID method
2. Stop Parallel slave threads
3. Enable GTID replication
4. Using percona-toolkit pt-slave-restart to skip past all the errors.
- Restart Replication without GTID method
Using master binglog position
CHANGE MASTER TO MASTER_HOST='12.34.56.789',MASTER_USER='slave_user', MASTER_PASSWORD='password', MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS= 107;This is well documented, Please google and find instructions.
- Stop Parallel slave threads
This was part of the problem as seen in the original question.
ERROR 1966 (HY000): When using parallel replication and GTID with multiple replication domains, @@sql_slave_skip_counter cannot be used. Instead, setting @@gtid_slave_pos explicitly can be used to skip to after a given GTID position.I want to be able to skip events and not worry about trying to figure out or increase the GTID position for everyone.
MariaDB [(none)]> stop slave;
Query OK, 0 rows affected (0.35 sec)
MariaDB [(none)]> set global slave_parallel_threads = 0;
Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> set global slave_parallel_mode = none;
Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> Start SLAVE;
Query OK, 0 rows affected (0.00 sec)Now if I check Parallel slave threads I see
MariaDB [(none)]> show slave status \G
*************************** 1. row ***************************
..........
Parallel_Mode: noneI can reverse this process to re-enable Parallel slave threads when I am done. And I know that GTID is working.
- Enable GTID replication
I can now try restarting the slave with GTID enabled.
On the master
MariaDB [(none)]> SHOW MASTER STATUS\G
*************************** 1. row ***************************
File: mariadb-bin.000001
Position: 510
Binlog_Do_DB:
Binlog_Ignore_DB:
1 row in set (0.00 sec)
SELECT BINLOG_GTID_POS('mariadb-bin.000001', 510);
+--------------------------------------------+
| BINLOG_GTID_POS('mariadb-bin.000001', 510) |
+--------------------------------------------+
| 1-101-1 |
+--------------------------------------------+
1 row in set (0.00 sec)On the slave
STOP SLAVE;
SET GLOBAL gtid_slave_pos = '1-101-1';
CHANGE MASTER TO master_use_gtid=slave_pos;
START SLAVE;Now when I check the slave it has some events to skip to get back into the same state as the master.
Last_Error: An attempt was made to binlog GTID 1-1050-5004291 which would create an out-of-order sequence number with existing GTID 1-1050-5004322, and gtid strict mode is enabled.MariaDB [(none)]> show slave status \G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Log_File: binary.000599
Read_Master_Log_Pos: 364810491
Relay_Log_File: tmsdb-relay-bin.001240
Relay_Log_Pos: 716
Relay_Master_Log_File: binary.000599
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1950
Last_Error: An attempt was made to binlog GTID 1-1050-5004291 which would create an out-of-order sequence number with existing GTID 1-1050-5004322, and gtid strict mode is enabled.
Skip_Counter: 0
Exec_Master_Log_Pos: 286447058
Relay_Log_Space: 78364447
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1950
Last_SQL_Error: An attempt was made to binlog GTID 1-1050-5004291 which would create an out-of-order sequence number with existing GTID 1-1050-5004322, and gtid strict mode is enabled.
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1050
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: Slave_Pos
Gtid_IO_Pos: 1-1050-5005223,2-1051-379101,3-1010-3273
Replicate_Do_Domain_Ids:
Replicate_Ignore_Domain_Ids:
Parallel_Mode: none
1 row in set (0.00 sec)- Using percona-toolkit pt-slave-restart to skip past all the errors
```
sudo yum install http://www.percona.com/download
Code Snippets
CHANGE MASTER TO MASTER_HOST='12.34.56.789',MASTER_USER='slave_user', MASTER_PASSWORD='password', MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS= 107;MariaDB [(none)]> stop slave;
Query OK, 0 rows affected (0.35 sec)
MariaDB [(none)]> set global slave_parallel_threads = 0;
Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> set global slave_parallel_mode = none;
Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> Start SLAVE;
Query OK, 0 rows affected (0.00 sec)MariaDB [(none)]> show slave status \G
*************************** 1. row ***************************
..........
Parallel_Mode: noneMariaDB [(none)]> SHOW MASTER STATUS\G
*************************** 1. row ***************************
File: mariadb-bin.000001
Position: 510
Binlog_Do_DB:
Binlog_Ignore_DB:
1 row in set (0.00 sec)
SELECT BINLOG_GTID_POS('mariadb-bin.000001', 510);
+--------------------------------------------+
| BINLOG_GTID_POS('mariadb-bin.000001', 510) |
+--------------------------------------------+
| 1-101-1 |
+--------------------------------------------+
1 row in set (0.00 sec)STOP SLAVE;
SET GLOBAL gtid_slave_pos = '1-101-1';
CHANGE MASTER TO master_use_gtid=slave_pos;
START SLAVE;Context
StackExchange Database Administrators Q#189844, answer score: 4
Revisions (0)
No revisions yet.