HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

InnoDB Master-Master replication goes inconsistant after power failure tests

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
afterinconsistantinnodbgoestestsreplicationpowermasterfailure

Problem

Afternoon gents,

I'm currently stress testing a Master-Master replication setup using InnoDB as database engine.

We're using this simple script for testing which we run in Linux CLI from a remote server.

<?php

while(true) {
    try {
        $conn = mysql_connect('10.0.10.210', 'test', 'test');
        if ($conn) {
            mysql_select_db('testdb');
            $random = rand(0, 1000);
            $res = mysql_query("INSERT INTO test VALUES(0, 'test',    $random)");
            if ($res) {
                echo "\n inserted " . microtime();
            } else {
                echo "\n not inserted " . microtime();
            }
            mysql_close($conn);
        } else {
            echo "\n can not connect";
        }
    } catch (Exception $ex) {
        echo "\n can not insert"  . microtime();
    }
}

var_dump($res);
echo "ok";


The issues we're facing is that we're trying to shut off one of the hosts using nothing but unplugging the power, a hard power off that is.

We're also using MySQL-MMM for fail-over purposes, but that have nothing to do with the issues we're facing but I'll explain the procedure we're using now.

1) Master-Master working perfectly, server1 having virtual IP 10.0.10.210 and is serving writes and reads

2) We shut off server1 by unplugging the cable, the virtual IP gets moved to server2, everything is working and inserts continue after ~20 seconds downtime.

3) We start server1 again, it goes up and gets back the virtual IP address, inserts continues after 1-2 second downtime.

The problem with this is that we lose all inserts that happened during the downtime of server1, and if I type "STOP SLAVE; START SLAVE;" I get these errors:

[ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: 'Client requested master to start replication from impossible position', Error_code: 1236


And if I check the binary log manually with an offset that corresponds with what the mysqld.log file says

Solution

Replication and binary logging happen independently from innodb, which can unfortunately cause problems.

Check out: http://dev.mysql.com/doc/refman/5.5/en/replication-options-binary-log.html#sysvar_sync_binlog

From what you describe I suspect that sync_binlog is set to 0 for your servers. Leaving it at 0 means mysql will rely upon the filesystem to handle flushing to disk. Effectively this means that the binlog data will often be in the filesystem cache. That gets flushed to disk by the kernel at some interval, but in the case of a power failure anything in there is lost.

Setting sync_binlog to 1 will force mysql to flush the binlog event to the filesystem using fdatasync after every commit. This is safer (since you will lose 1 transaction at most in case of power failure), but creates a lot more disk activity. Benchmark and see what the impact is for your workload. Knowing the tradeoffs for both scenarios will hopefully help you make an informed decision.

Hope that helps.

Context

StackExchange Database Administrators Q#27892, answer score: 3

Revisions (0)

No revisions yet.