HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Is it safe to disable sync_binlogs, innodb_flush_log_at_trx_commit and innodb_doublewrite on a site-redundant galera cluster?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
disableinnodb_doublewriteredundantsiteinnodb_flush_log_at_trx_commitsafeandgalerasync_binlogscluster

Problem

On a site-redundant galera cluster, a commit should only return after a quorum of database nodes have accepted the transaction. If one database node goes down, all commits will be preserved by the rest of the cluster, and the downed database node will get in sync with the rest of the cluster when starting up the database again. If the database should have become corrupt, one can always nuke the mysql data dir and start with an empty database, it will eventually catch up with the rest of the cluster. Hence it sounds like we could tweak performance by turning off options for ensuring local ACID-compliance.

So the question here is really ... "What could possibly go wrong?"

:-)

Based on the comments, I'll give some information about our particular setup:

-
The cluster consists of three nodes. Two of them are actively used in production, one of them more heavily used than the other. The third node is only used for quorum and backup purposes.

-
Site-redundant means that the nodes are in different server centers. I find it hard to think of anything that would cause two nodes to go down simultaneously - except a serious mysql bug, how likely is that? Admittedly, two of the nodes are situated less than 10 km apart (backup/quorum node is separated by several hundred km plus a national border). A medium-sized nuclear bomb could potentially take out two of the nodes simultaneously ... then again, "we have problems with the database" is probably the least of our concerns in such a scenario. Could a solar storm possibly take out both or all of the servers simultaneously?

-
Our performance problem is primary due to the write cache on our SAN going full from time to time. We're working on mitigating this issue, but we can never guarantee that it won't happen again. Every now and then we experience "hicups" where we have transactions waiting for some 10-30 seconds.

-
In this particular setup, a 30s delay may actually be a matter of life and death. Well, most l

Solution

I've done some research on this now, so I'll answer on my own question.

Summary first

If the performance problems are observed only on one node, it's perfectly safe to turn off innodb_doublewrite and innodb_flush_log_at_trx_commit on the affected node - just remember to use one of the other nodes for bootstrapping should the whole cluster fail.

# echo "[server]" >> /etc/mysql/conf.d/temp_perfomance_hax.cnf
# echo "innodb_doublewrite = 0" >> /etc/mysql/conf.d/temp_performance_hax.cnf
# echo "innodb_flush_log_at_trx_commit = 2" >> /etc/mysql/conf.d/temp_performance_hax.cnf
# systemctl restart mysql # innodb_doublewrite cannot be set run-time


Also, in a true site-redundant galera cluster, it should be reasonably safe to put innodb_flush_log_at_trx_commit to 2 and sync_binlog to 0 on any number of nodes. If one is experiencing performance problems, if uptime and performance is important, and if one lost transaction probably isn't a matter of life and death - then no hesitations needed, just go ahead and do it.

set global sync_binlog = '0';
set global innodb_flush_log_at_trx_commit = 2;


and one can probably go ahead and turn off double write as well if performance is more important than ACID-compliance. The probability of disasters is pretty low.

I can see two reasons for sticking to the "safe" settings:

  • Preparedness for the worst-case-scenario: what if the whole cluster melts down simultaneously?



  • More robustness on the nodes



Node robustness: keep innodb_doublewrite turned on

innodb_doublewrite can be safely turned off at one node without losing ACID-complience, but it may still be better to have it turned on. According to When is it safe to disable InnoDB doublewrite buffering? turning off innodb-doublewrite may cause corrupted data, in that case it may be needed to wipe the data partition on the downed node and start it up again. This comes with three costs:

  • manual sysadmin intervention



  • elevated levels of IO on the cluster while the corrupted node is performing a SST



  • the cluster will have to run with reduced redundancy until the node is fully operative again.



(As far as I can understand, a missing flush on log files shouldn't cause problems starting a db node - i.e. for innodb_flush_log_at_trx_commit the manual explicitly states that "InnoDB's crash recovery works regardless of the value")

Reasons why the whole cluster may burn down at once

Hard explosions

Consider innodb_flush_log_at_trx_commit to be set to 2, binlog_sync to be set to 0 and innodb-doublewrite-buffering set to 1, this may give performance gains, and as far as I can understand the only failure mode it won't cover is if a quorum of nodes are suffering a power-loss or kernel panic. For a true multi-site-setup, those seems rare enough that one may want to neglect them.

-
If the nodes are located nearby, a local problem, i.e. power outage, flooding, fire, etc can knock all of the nodes out in the same time. I've multiple times witnessed a whole server site going down for the weirdest reasons even though everything was supposed to be redundant.

-
Could solar storms affect the whole cluster simultaneously? Especially if the cluster is located in the same city and in polar regions? I've seen two laptops crashing simultaneously, being in the polar region during a solar storm, and with no other plausible reasons (different OS'es, it wasn't under a leap second, etc).

-
If all nodes runs the same OS, some OS kernel problem could potentially cause all nodes to crash hard with a kernel panic in exactly the same time.

Software crashes

I believe that if innodb_flush_log_at_trx_commit is set to 0 or innodb_flush_log_at_trx_commit is set to 0, then a mysql crash affecting all nodes simultaneously could cause data loss.

Has anyone ever observed galera crashing in such a way?

A system admin or buggy script doing "kill -9" simultaneously on mysql on all the nodes too ... but if the system admin wants to cause data loss, he will probably find a way regardless of those settings.

Cluster going down for other reasons

In those cases, those settings shouldn't matter:

-
Sudden and total network isolation between all three nodes, or leaving the cluster fragmented so no quorum can be found. One would need to bootstrap the cluster after such an incident, and if one really want to be 100% sure that no transactions are lost, one needs to manually investigate binlogs (see more on that below) - but since neither mysql nor the OS has crashed, all state will eventually be synced, no matter the sync settings.

-
Bad timing - i.e. one node going down and while it's down another node could go down or there could be some network isolation. Afterwards, the cluster needs to be bootstrapped. However, everything that the nodes have received will be written to disk, so the above mentioned settings will not affect anything. (We've had this coming over us once).

Manual intervention is needed on cluster failures

No m

Code Snippets

# echo "[server]" >> /etc/mysql/conf.d/temp_perfomance_hax.cnf
# echo "innodb_doublewrite = 0" >> /etc/mysql/conf.d/temp_performance_hax.cnf
# echo "innodb_flush_log_at_trx_commit = 2" >> /etc/mysql/conf.d/temp_performance_hax.cnf
# systemctl restart mysql # innodb_doublewrite cannot be set run-time
set global sync_binlog = '0';
set global innodb_flush_log_at_trx_commit = 2;

Context

StackExchange Database Administrators Q#160553, answer score: 3

Revisions (0)

No revisions yet.