HiveBrain v1.2.0
Get Started
← Back to all entries
debugsqlMinor

Replication error

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
errorreplicationstackoverflow

Problem

We have a slave server that has stopped replication due to the following error:

Slave SQL: Query caused different errors on master and slave.


What could be the cause of this error? And what would be a way to fix it?

Version of both master and slave is MySQL 5.5.30

```
130726 23:55:45 [Note] C:\Program Files\MySQL\MySQL Server 5.5\bin\mysqld: Shutdown complete

130726 23:58:39 [Note] Plugin 'FEDERATED' is disabled.
130726 23:58:39 [Warning] C:\Program Files\MySQL\MySQL Server 5.5\bin\mysqld: ignoring option '--innodb-file-per-table' due to invalid value 'ON'
130726 23:58:39 [Note] Plugin 'InnoDB' is disabled.
130726 23:58:39 [Note] Server hostname (bind-address): '0.0.0.0'; port: 3306
130726 23:58:39 [Note] - '0.0.0.0' resolves to '0.0.0.0';
130726 23:58:39 [Note] Server socket created on IP: '0.0.0.0'.
130726 23:58:39 [Note] Slave SQL thread initialized, starting replication
in log 'mysql-bin.000234' at position 1065421256,
relay log '.\slave-relay-bin.000917' position: 1065421402
130726 23:58:39 [Note] Slave I/O thread: connected to master 'replication@191.5.3.4:3306',
replication started in log 'mysql-bin.000235' at position 166680598
130726 23:58:39 [Note] Event Scheduler: Loaded 0 events
130726 23:58:39 [Note] C:\Program Files\MySQL\MySQL Server 5.5\bin\mysqld:
ready for connections.
Version: '5.5.30-log' socket: '' port: 3306 MySQL Community Server (GPL)
130726 23:59:04 [ERROR] Slave SQL: Query caused different errors on master and slave.
Error on master: message (format)='Incorrect key file for table '%-.200s';
try to repair it' error code=1034 ;
Error on slave: actual message='no error', error code=0.
Default database: 'shared'.
Query: 'CREATE TEMPORARY TABLE tmp_grades (
vehsysid INT(11),
grade INT(1),
dt TIMESTAMP,
INDEX(vehsysid),
INDEX(grade),
INDEX(dt)

Solution

I suspect that some of this is stating the obvious. I apologize for that but wanted to include it in the interest of thoroughness.

The slave stopping in this condition is the expected behavior.

Every query that gets written to the binlog has metadata that includes the error code that the query returned on the server that generated the binlog event. The error code is normally 0 for "no error." You can see this error code and the other metadata if you run the log through mysqlbinlog.

#130727 10:52:46 server id 10 end_log_pos 166 Query thread_id=673471 exec_time=1 error_code=0


If an error occurs after the query reaches a certain point in execution, the query still gets written to the binlog along with the error code.

The slave expects to encounter the same error that the master encountered (usually but not always 0) and stops when the values don't match to help avoid data inconsistencies that could otherwise result if replication was allowed to continue after such a condition was encountered.

The binlog data structures don't appear to have a place for the error message or the values that were sprintf()'ed into the message -- just the code -- so the slave has to translate that code into something human-readable. Lacking the values that the master would have plugged in to the message, it just displays the raw message format, including the placeholders.

# from source file include/mysqld_ername.h
{ "ER_NOT_KEYFILE", 1034, "Incorrect key file for table \'%-.200s\'; try to repair it" },


So what you're seeing as the last replication error on the slave is normal behavior when an error occurs on the master but the same query doesn't generate an error on the slave.

You can skip over the error with SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; START SLAVE; as you have probably already done by now, but recovery beyond that depends on what you did with the results from that temporary table. If the temporary table was created and populated by a single statement-based event, and wasn't used to do any kind of further table updates, you might be fine, because replication typically ignores errors that occur when trying to DROP TEMPORARY TABLE and the temporary table doesn't exist on the slave, since the slave thread has no reliable way of knowing whether the CREATE TEMPORARY TABLE statement actually appeared previously in the binlog or whether the slave might have stopped and started between the CREATE and DROP events on the temporary table (which would have destroyed any temporary tables that had been created by the slave SQL thread before the stop/start). On the other hand, if the temporary table was used to update other tables, you still might be fine because those updates might have been written as row-based events since you're in MIXED binlog format... but I would verify consistency among any tables you updated from the temporary table results, if there are any.

As for the errors that occurred on the master:

130725 23:15:57 [Warning] Warning: Enabling keys got errno 120 on shared.tmp_grades, retrying
130726 23:15:58 [Warning] Warning: Enabling keys got errno 137 on shared.tmp_grades, retrying


There is something I missed about these at first -- I thought they were 1 second apart, but these errors actually appear to have been on two different days, the 25th and 26th, so 24 hours and 1 second apart... 2 different warnings on the same temporary table name but in fact two different temporary tables generated a day apart.

MySQL error code 120: Didn't find key on read or update (HA_ERR_KEY_NOT_FOUND)
MySQL error code 137: No more records (read after end of file) (HA_ERR_END_OF_FILE)


At first I thought it was strange that you had no "Incorrect key file" messages in the master's log, but those wouldn't be in the log -- those would be errors returned to the client that issued the query.

I remember having seen errors like that in my servers' error logs, but it turns out, from what I can tell, those "Incorrect key file" errors only go to the server error log if they are encountered by the slave_sql thread or by a query executed by the event scheduler, since those are the only times there's not a client connection to receive the error... so it could be that whatever is running the query that creates the temporary table has generated a log of errors that might be of interest.

We can't technically assume that it was the 'tmp_grades' table that encountered the "Incorrect key file" error if the SELECT portion of the query created its own implicit temporary table... it could have been that table instead, causing an error that affected the query.

Looking into the internals, a bit, it looks like when you create a MyISAM temporary table with indexes in a CREATE TEMPORARY TABLE ... SELECT statement, the table is created with the indexes disabled and then the server builds and enables the indexes after the data is written rather than having to update indexes as it inserts dat

Code Snippets

#130727 10:52:46 server id 10 end_log_pos 166 Query thread_id=673471 exec_time=1 error_code=0
# from source file include/mysqld_ername.h
{ "ER_NOT_KEYFILE", 1034, "Incorrect key file for table \'%-.200s\'; try to repair it" },
130725 23:15:57 [Warning] Warning: Enabling keys got errno 120 on shared.tmp_grades, retrying
130726 23:15:58 [Warning] Warning: Enabling keys got errno 137 on shared.tmp_grades, retrying
MySQL error code 120: Didn't find key on read or update (HA_ERR_KEY_NOT_FOUND)
MySQL error code 137: No more records (read after end of file) (HA_ERR_END_OF_FILE)

Context

StackExchange Database Administrators Q#47117, answer score: 9

Revisions (0)

No revisions yet.