HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Mysql (percona-server 5.5) completely stalls during import (myloader)

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
completelyperconaduringmysqlmyloaderserverstallsimport

Problem

While configuring our new servers (IBM X3650M4, 2x Intel Xeon E5-2620, 64G RAM, RAID 10 (4x 240G SSD)) we bumped into a strange problem. We're trying to import dumps (mydumper) from our (old) production environment into our new one, but we're hardly able to do so.

The dump is 47G (compressed), +/- 4500 tables, the biggest tables are 7.1G (54285914 rows), 2.2G, 1.7G and 1.7G.

We're trying to run myloader with 12 (or 8) threads, but most of the times that fails. After a few minutes myloader comes across the large tables and just stalls. Several tools show no more activity (htop no cpu load, iostat no writes or reads), mytop shows 4 (long running) queries that never finish, most innotop screens are frozen (because SHOW INNODB STATUS does not return any results anymore). When I try to restart mysql that fails too, because mysql is not able to kill a number of threads. The only remedy is to use killall -9 mysqld mysql.

We are using the following mysql configuration:

```
# cat /etc/mysql/my.cnf
# Ansible managed

[client]
port = 3306
socket = /var/run/mysqld/mysqld.sock

ssl_cert = /etc/mysql/client-cert.pem
ssl_key = /etc/mysql/client-key.pem

[mysqld_safe]
socket = /var/run/mysqld/mysqld.sock
nice = 0

[mysqld]
user = mysql
socket = /var/run/mysqld/mysqld.sock
pid_file = /var/run/mysqld/mysqld.pid
port = 3306
basedir = /usr
datadir = /var/lib/mysql
tmpdir = /tmp
lc_messages_dir = /usr/share/mysql

general_log = 0
general_log_file = /var/log/mysql/mysql.log
log_error = /var/log/mysql/error.log
log_queries_not_using_indexes = 0
slow_query_log = 0
slow_query_log_file = /var/log/mysql/mysql-slow.log

skip_external_locking = 1
skip_name_resolve = 1

max_connections = 1000
wait_timeout = 28800
interactive_timeout = 28800

sql_mode = NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES

character_set_server = utf8
collation_server = utf8_general_ci
skip-character-set-client-handshake
init_connect='SET collation_connection = utf8_general_ci; SET NAMES utf8;'

d

Solution

There is a known issue when certain checkpointing / flushing configuration results in stall. Here is a quote from Percona 5.5 documentation Improved InnoDB I/O Scalability:


Normally, the checkpoint is done passively at the current oldest page
modification (this is called “fuzzy checkpointing”). When the
checkpoint age nears the maximum checkpoint age (determined by the
total length of all transaction log files), InnoDB tries to keep the
checkpoint age away from the maximum by flushing many dirty blocks.
But, if there are many updates per second and many blocks have almost
the same modification age, the huge number of flushes can cause
stalls.


[...]


keep_average [2]: This method attempts to keep the I/O rate constant
by using a much shorter loop cycle (0.1 second) than that of the other
methods (1.0 second). It is designed for use with SSD cards

keep_average is designed to be used with SSD cards which are faster than SSD disks. In your case InnoDB uses very short cycle to come back to flush dirty pages. Then, as per docs, too many pages with very close timestamps cause congestion and stall.
Unless you are using SSD cards like Fusion IO you should not use this setting.

Hope this helps.

Context

StackExchange Database Administrators Q#120923, answer score: 3

Revisions (0)

No revisions yet.