patternsqlMinor
High CPU system time usage on MySQL server
Viewed 0 times
systemhightimeusagemysqlservercpu
Problem
A little backstory, some time ago we started to experience high CPU system time on one of our MySQL databases. This database was also suffering from high disk utilization so we figured that those things are connected. And since we already had plans to migrate it to SSD we thought that it will solve both issues.
It helped... but not for long.
For a couple of weeks after migration CPU graph was like this:
But now we are back to this:
This happened out of nowhere, without any apparent changes in load or application logic.
DB stats:
MySQL config:
```
[client]
port = 3306
socket = /var/run/mysqld/mysqld.sock
[mysqld_safe]
pid-file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
nice = 0
[mysqld]
user = mysql
pid-file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
port = 3306
basedir = /usr
datadir = /opt/mysql-data
tmpdir = /tmp
lc-messages-dir = /usr/share/mysql
explicit_defaults_for_timestamp
sql_mode = STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
log-error = /opt/mysql-log/error.log
# Replication
server-id = 76
gtid-mode = ON
enforce-gtid-consistency = true
relay-log = /opt/mysql-log/mysql-relay-bin
relay-log-index = /opt/mysql-log/mysql-relay-bin.index
replicate-wild-do-table = dbname.%
log-bin = /opt/mysql-log/mysql-bin.log
expire_logs_days = 7
max_binlog_size = 1024M
binlog-format = ROW
log-bin-trust-function-creators = 1
log_slave_updates = 1
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
# * IMPORTANT: Additional settings that can override those from this file!
# The files must end with '.cnf', otherwise they'll be ignored.
#
It helped... but not for long.
For a couple of weeks after migration CPU graph was like this:
But now we are back to this:
This happened out of nowhere, without any apparent changes in load or application logic.
DB stats:
- MySQL version - 5.7.20
- OS - Debian
- DB size - 1.2Tb
- RAM - 700Gb
- CPU cores - 56
- Peek load - about 5kq/s read, 600q/s write (although select queries are often pretty complex)
- Threads - 50 running, 300 connected
- It has about 300 tables, all InnoDB
MySQL config:
```
[client]
port = 3306
socket = /var/run/mysqld/mysqld.sock
[mysqld_safe]
pid-file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
nice = 0
[mysqld]
user = mysql
pid-file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
port = 3306
basedir = /usr
datadir = /opt/mysql-data
tmpdir = /tmp
lc-messages-dir = /usr/share/mysql
explicit_defaults_for_timestamp
sql_mode = STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
log-error = /opt/mysql-log/error.log
# Replication
server-id = 76
gtid-mode = ON
enforce-gtid-consistency = true
relay-log = /opt/mysql-log/mysql-relay-bin
relay-log-index = /opt/mysql-log/mysql-relay-bin.index
replicate-wild-do-table = dbname.%
log-bin = /opt/mysql-log/mysql-bin.log
expire_logs_days = 7
max_binlog_size = 1024M
binlog-format = ROW
log-bin-trust-function-creators = 1
log_slave_updates = 1
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
# * IMPORTANT: Additional settings that can override those from this file!
# The files must end with '.cnf', otherwise they'll be ignored.
#
Solution
600q/s write with a flush per commit is probably hitting the limit of your current spinning disks. Switching to SSDs would relieve the pressure.
The quick fix (before getting SSDs) is probably to change to this setting:
But read the caveats on making that change.
Having that setting and SSDs would let you grow further.
Another possible fix is to combine some writes into a single
Almost always, high CPU and/or I/O is due to poor indexes and/or poor formulation of queries. Turn on the slowlog with
Note that with 50 "running" threads, there is a lot of contention; this may be causing the switching, etc, that you noted. We need to get queries to finish faster. With 5.7, the system may keel over with 100 running threads. Increasing beyond about 64, the context switches, mutexes, locks, etc, conspire to slow down every thread, leading to no improvement in throughput while latency goes through the roof.
For a different approach to analyzing the problem, please provide
Analysis of VARIABLES & STATUS
(Sorry, nothing jumps out as addressing your Question.)
Observations:
The More Important Issues:
A lot of temp tables, especially disk-based, are created for complex queries. Let's hope that the slow log will identifiy some queries that can be improved (via indexing / reformulating / etc.)
Other indicators are joins without indexes and sort_merge_passes; however, neither of these is conclusive, we need to see the queries.
Consider throttling the number of connections by throttling the clients. Are you using Apache, Tomcat, or something else?
70
Increasing the number of statements in each
Thousands of tables is usually not a good design.
Why so many CREATE+DROP PROCEDURE ?
Why so many SHOW commands?
Details and other observations:
-- check innodb_buffer_pool_size
-- Several hundred is usually good.
-- Very busy InnoDB.
-- (see minutes)
-- (The recommendation of 60 minutes between rotations is somewhat arbitrary.) Adjust innodb_log_file_size. (Cannot change in AWS.)
-- An excessive frequency of rollbacks may indicate inefficient app logic.
-- May be caused by complex queries that could be optimized.
-- (Symptom of other issues)
-- Something big is going
The quick fix (before getting SSDs) is probably to change to this setting:
innodb_flush_log_at_trx_commit = 2But read the caveats on making that change.
Having that setting and SSDs would let you grow further.
Another possible fix is to combine some writes into a single
COMMIT (where the logic is not violated).Almost always, high CPU and/or I/O is due to poor indexes and/or poor formulation of queries. Turn on the slowlog with
long_query_time=1, wait a while, then see what turns up. With queries in hand, provide SELECT, EXPLAIN SELECT ..., and SHOW CREATE TABLE. Ditto for the write queries. From those, we can probably tame the CPU and/or I/O. Even with your current setting of 3, pt-query-digest might find some interesting things.Note that with 50 "running" threads, there is a lot of contention; this may be causing the switching, etc, that you noted. We need to get queries to finish faster. With 5.7, the system may keel over with 100 running threads. Increasing beyond about 64, the context switches, mutexes, locks, etc, conspire to slow down every thread, leading to no improvement in throughput while latency goes through the roof.
For a different approach to analyzing the problem, please provide
SHOW VARIABLES and SHOW GLOBAL STATUS? More discussion here.Analysis of VARIABLES & STATUS
(Sorry, nothing jumps out as addressing your Question.)
Observations:
- Version: 5.7.20-log
- 700 GB of RAM
- Uptime = 36d 13:21:34
- You are not running on Windows.
- Running 64-bit version
- You appear to be running entirely (or mostly) InnoDB.
The More Important Issues:
A lot of temp tables, especially disk-based, are created for complex queries. Let's hope that the slow log will identifiy some queries that can be improved (via indexing / reformulating / etc.)
Other indicators are joins without indexes and sort_merge_passes; however, neither of these is conclusive, we need to see the queries.
Max_used_connections = 701 is >= Max_connections = 700, so there were probably some connections refused. Also, if that indicated more than, say, 64 threads running, then system performance probably sufferd at that time.Consider throttling the number of connections by throttling the clients. Are you using Apache, Tomcat, or something else?
70
Threads_running indicates that, at the moment of doing this SHOW, the system was in trouble.Increasing the number of statements in each
COMMIT (when reasonable) may help performance some.innodb_log_file_size, at 15GB, is bigger than necessary, but I see not need to change it.Thousands of tables is usually not a good design.
eq_range_index_dive_limit = 200 concerns me, but I don't know how to advise. Was it a deliberate choice?Why so many CREATE+DROP PROCEDURE ?
Why so many SHOW commands?
Details and other observations:
( Innodb_buffer_pool_pages_flushed ) = 523,716,598 / 3158494 = 165 /sec -- Writes (flushes)-- check innodb_buffer_pool_size
( table_open_cache ) = 10,000 -- Number of table descriptors to cache-- Several hundred is usually good.
( (Innodb_buffer_pool_reads + Innodb_buffer_pool_pages_flushed) ) = ((61,040,718 + 523,716,598) ) / 3158494 = 185 /sec -- InnoDB I/O( Innodb_dblwr_pages_written/Innodb_pages_written ) = 459,782,684/523,717,889 = 87.8% -- Seems like these values should be equal?( Innodb_os_log_written ) = 1,071,443,919,360 / 3158494 = 339226 /sec -- This is an indicator of how busy InnoDB is.-- Very busy InnoDB.
( Innodb_log_writes ) = 110,905,716 / 3158494 = 35 /sec( Innodb_os_log_written / (Uptime / 3600) / innodb_log_files_in_group / innodb_log_file_size ) = 1,071,443,919,360 / (3158494 / 3600) / 2 / 15360M = 0.0379 -- Ratio-- (see minutes)
( Uptime / 60 innodb_log_file_size / Innodb_os_log_written ) = 3,158,494 / 60 15360M / 1071443919360 = 791 -- Minutes between InnoDB log rotations Beginning with 5.6.8, this can be changed dynamically; be sure to also change my.cnf.-- (The recommendation of 60 minutes between rotations is somewhat arbitrary.) Adjust innodb_log_file_size. (Cannot change in AWS.)
( Com_rollback ) = 770,457 / 3158494 = 0.24 /sec -- ROLLBACKs in InnoDB.-- An excessive frequency of rollbacks may indicate inefficient app logic.
( Innodb_row_lock_waits ) = 632,292 / 3158494 = 0.2 /sec -- How often there is a delay in getting a row lock.-- May be caused by complex queries that could be optimized.
( Innodb_dblwr_writes ) = 97,725,876 / 3158494 = 31 /sec -- "Doublewrite buffer" writes to disk. "Doublewrites" are a reliability feature. Some newer versions / configurations don't need them.-- (Symptom of other issues)
( Innodb_row_lock_current_waits ) = 13 -- The number of row locks currently being waited for by operations on InnoDB tables. Zero is pretty normal.-- Something big is going
Code Snippets
innodb_flush_log_at_trx_commit = 2Com_commit = 529 /sec
Com_create_procedure = 0.01 /HR
Com_drop_procedure = 0.01 /HR
Com_delete = 12 /sec
Com_delete_multi = 63 /sec
Com_insert = 219 /sec
Com_kill = 0.69 /HR
Com_reset = 0.0011 /HR
Com_revoke = 0.0023 /HR
Com_select = 2141 /sec
Com_show_binlogs = 12 /HR
Com_show_create_func = 0.011 /HR
Com_show_privileges = 0.0034 /HR
Com_show_profile = 0.027 /HR
Com_show_profiles = 0.028 /HR
Com_show_slave_status = 0.037 /sec
Com_show_storage_engines = 12 /HR
Com_show_warnings = 0.14 /sec
Com_slave_stop = 0.0011 /HR
Com_update_multi = 25 /sec
Created_tmp_files = 0.3 /sec
Handler_commit = 3251 /sec
Handler_external_lock = 18787 /sec
Handler_prepare = 615 /sec
Handler_read_first = 239 /sec
Handler_read_key = 173669 /sec
Handler_read_next = 1291439 /sec
Handler_read_prev = 28535 /sec
Handler_read_rnd = 32789 /secInnodb_buffer_pool_bytes_dirty = 7.03e+10
Innodb_buffer_pool_pages_data = 3.41e+7
Innodb_buffer_pool_pages_dirty = 4.29e+6
Innodb_buffer_pool_pages_misc = 2.15e+6
Innodb_buffer_pool_pages_total = 3.62e+7
Innodb_data_fsyncs = 132 /sec
Innodb_data_writes = 232 /sec
Innodb_data_written = 5440151 /sec
Innodb_dblwr_pages_written = 145 /sec
Innodb_os_log_written / (Uptime / 3600) / innodb_log_files_in_group = 582.3MB
Innodb_pages_written = 165 /sec
Innodb_row_lock_time = 5.97e+7
Innodb_rows_deleted + Innodb_rows_inserted = 2180 /sec
Innodb_rows_inserted = 2155 /sec
Innodb_rows_read = 1398531 /sec
Max_used_connections = 701
Open_tables = 10,000
Select_full_range_join = 2.57e+7
Select_range = 130 /sec
Sort_range = 30 /sec
Sort_scan = 332 /sec
Table_open_cache_hits = 9354 /sec
Threads_running = 71
eq_range_index_dive_limit = 200
innodb_purge_threads = 4
innodb_thread_sleep_delay = 16,925Context
StackExchange Database Administrators Q#195338, answer score: 6
Revisions (0)
No revisions yet.