HiveBrain v1.2.0
Get Started
← Back to all entries
debugsqlMinor

SQL_SLAVE_SKIP_COUNTER = 1 fails, setting @@gtid_slave_pos used to skip a given GTID position

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
sql_slave_skip_countergtidfailsusedpositiongtid_slave_posskipsettinggiven

Problem

I recently broke replication and when I tried to get past the one incorrect transaction. I got the following.

MariaDB [(none)]> STOP SLAVE;
Query OK, 0 rows affected (0.05 sec)

MariaDB [(none)]> SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;
ERROR 1966 (HY000): When using parallel replication and GTID with multiple replication domains, @@sql_slave_skip_counter cannot be used. Instead, setting @@gtid_slave_pos explicitly can be used to skip to after a given GTID position.
MariaDB [(none)]> select @@gtid_slave_pos;
+---------------------------------------------+
| @@gtid_slave_pos                            |
+---------------------------------------------+
| 0-1051-1391406,1-1050-1182069,57-1051-98897 |
+---------------------------------------------+
1 row in set (0.00 sec)

MariaDB [(none)]> show variables like '%_pos%';
+----------------------+---------------------------------------------------------+
| Variable_name        | Value                                                   |
+----------------------+---------------------------------------------------------+
| gtid_binlog_pos      | 0-1051-1391406,2-1051-4474,57-1051-98897                |
| gtid_current_pos     | 0-1051-1391406,1-1050-1182069,2-1051-4474,57-1051-98897 |
| gtid_slave_pos       | 0-1051-1391406,1-1050-1182069,57-1051-98897             |
| wsrep_start_position | 00000000-0000-0000-0000-000000000000:-1                 |
+----------------------+---------------------------------------------------------+


What do I need to do to fix this.

Update 1

```
MariaDB [(none)]> show variables like '%gtid%';
+------------------------+------------------------------------------+
| Variable_name | Value |
+------------------------+------------------------------------------+
| gtid_binlog_pos | 1-1050-4820789,2-1051-379101,3-1010-3273 |
| gtid_binlog_state | 1-1050-4820789,2-1051-379101,3-1010-3273 |
| gtid_current_pos | 1-1050-4819948,2-1051-379

Solution

I found the following worked for me. This does not restore a slave into state that is an exact replica of master. There will be data differences.
I will use pt-table-sync to fix those.

  1. Restart Replication without GTID method



2. Stop Parallel slave threads

3. Enable GTID replication

4. Using percona-toolkit pt-slave-restart to skip past all the errors.

  1. Restart Replication without GTID method


Using master binglog position

CHANGE MASTER TO MASTER_HOST='12.34.56.789',MASTER_USER='slave_user', MASTER_PASSWORD='password', MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=  107;


This is well documented, Please google and find instructions.

  1. Stop Parallel slave threads



This was part of the problem as seen in the original question.

ERROR 1966 (HY000): When using parallel replication and GTID with multiple replication domains, @@sql_slave_skip_counter cannot be used. Instead, setting @@gtid_slave_pos explicitly can be used to skip to after a given GTID position.

I want to be able to skip events and not worry about trying to figure out or increase the GTID position for everyone.

MariaDB [(none)]> stop slave;
Query OK, 0 rows affected (0.35 sec)
MariaDB [(none)]> set global slave_parallel_threads = 0;
Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> set global slave_parallel_mode = none;
Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> Start SLAVE;
Query OK, 0 rows affected (0.00 sec)


Now if I check Parallel slave threads I see

MariaDB [(none)]> show slave status \G
*************************** 1. row ***************************
     ..........
                Parallel_Mode: none


I can reverse this process to re-enable Parallel slave threads when I am done. And I know that GTID is working.

  1. Enable GTID replication



I can now try restarting the slave with GTID enabled.

On the master

MariaDB [(none)]> SHOW MASTER STATUS\G
*************************** 1. row ***************************
            File: mariadb-bin.000001
        Position: 510
    Binlog_Do_DB:
Binlog_Ignore_DB:
1 row in set (0.00 sec)

SELECT BINLOG_GTID_POS('mariadb-bin.000001', 510);
+--------------------------------------------+
| BINLOG_GTID_POS('mariadb-bin.000001', 510) |
+--------------------------------------------+
| 1-101-1                                    |
+--------------------------------------------+
1 row in set (0.00 sec)


On the slave

STOP SLAVE;
SET GLOBAL gtid_slave_pos = '1-101-1';
CHANGE MASTER TO master_use_gtid=slave_pos;
START SLAVE;


Now when I check the slave it has some events to skip to get back into the same state as the master.

Last_Error: An attempt was made to binlog GTID 1-1050-5004291 which would create an out-of-order sequence number with existing GTID 1-1050-5004322, and gtid strict mode is enabled.

MariaDB [(none)]> show slave status \G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
              Master_Log_File: binary.000599
          Read_Master_Log_Pos: 364810491
               Relay_Log_File: tmsdb-relay-bin.001240
                Relay_Log_Pos: 716
        Relay_Master_Log_File: binary.000599
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1950
                   Last_Error: An attempt was made to binlog GTID 1-1050-5004291 which would create an out-of-order sequence number with existing GTID 1-1050-5004322, and gtid strict mode is enabled.
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 286447058
              Relay_Log_Space: 78364447
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 1950
               Last_SQL_Error: An attempt was made to binlog GTID 1-1050-5004291 which would create an out-of-order sequence number with existing GTID 1-1050-5004322, and gtid strict mode is enabled.
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1050
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: 1-1050-5005223,2-1051-379101,3-1010-3273
      Replicate_Do_Domain_Ids: 
  Replicate_Ignore_Domain_Ids: 
                Parallel_Mode: none
1 row in set (0.00 sec)


  1. Using percona-toolkit pt-slave-restart to skip past all the errors



```
sudo yum install http://www.percona.com/download

Code Snippets

CHANGE MASTER TO MASTER_HOST='12.34.56.789',MASTER_USER='slave_user', MASTER_PASSWORD='password', MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=  107;
MariaDB [(none)]> stop slave;
Query OK, 0 rows affected (0.35 sec)
MariaDB [(none)]> set global slave_parallel_threads = 0;
Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> set global slave_parallel_mode = none;
Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> Start SLAVE;
Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> show slave status \G
*************************** 1. row ***************************
     ..........
                Parallel_Mode: none
MariaDB [(none)]> SHOW MASTER STATUS\G
*************************** 1. row ***************************
            File: mariadb-bin.000001
        Position: 510
    Binlog_Do_DB:
Binlog_Ignore_DB:
1 row in set (0.00 sec)

SELECT BINLOG_GTID_POS('mariadb-bin.000001', 510);
+--------------------------------------------+
| BINLOG_GTID_POS('mariadb-bin.000001', 510) |
+--------------------------------------------+
| 1-101-1                                    |
+--------------------------------------------+
1 row in set (0.00 sec)
STOP SLAVE;
SET GLOBAL gtid_slave_pos = '1-101-1';
CHANGE MASTER TO master_use_gtid=slave_pos;
START SLAVE;

Context

StackExchange Database Administrators Q#189844, answer score: 4

Revisions (0)

No revisions yet.