patternsqlMinor
InnoDB - Intermittent CPU spikes on a large database
Viewed 0 times
innodbdatabaseintermittentlargecpuspikes
Problem
I have a machine with a large MySQL 5.6 database (multiple tables with 10-100 million rows). It has a considerable amount of load, especially during the evening and is running on a 16 core machine.
No matter what the load is though, we always get these intermittent spikes that can cause problems. Our CPU load looks like this:
(these are very load-light times, especially the time before 6am, should be basically "idle")
The only solution to the problem I have found so far is setting up a new server, mirroring the data and switching to that one. This usually buys me about 2-3 months, then the spikes start appearing again. Just restarting MySQL or rebooting the server does not change anything.
These are also not caused by cronjobs. Even if I disable all of them, this still happens.
Here is a gist of the InnoDB status right now:
https://gist.github.com/fleshgolem/de1d4a661fb545fabfda
And here is a dump of the server variables:
```
Variable_name Value
auto_increment_increment 1
auto_increment_offset 1
autocommit ON
automatic_sp_privileges ON
back_log 200
basedir /usr
big_tables OFF
bind_address *
binlog_cache_size 32768
binlog_checksum CRC32
binlog_direct_non_transactional_updates OFF
binlog_format ROW
binlog_max_flush_queue_time 0
binlog_order_commits ON
binlog_row_image FULL
binlog_rows_query_log_events OFF
binlog_stmt_cache_size 32768
block_encryption_mode aes-128-ecb
bulk_insert_buffer_size 8388608
character_set_client utf8
character_set_connection utf8
character_set_database latin1
character_set_filesystem binary
character_set_results utf8
character_set_server latin1
character_set_system utf8
character_sets_dir /usr/share/mysql/charsets/
collation_connection utf8_general_ci
collation_database latin1_swedish_ci
collation_server latin1_swedish_ci
completion_type NO_CHAIN
concurrent_insert AUTO
connect_timeout 10
core_file OFF
datadir /var/lib/mysql/
date_format %Y-%m-%d
datetime_format %Y-%m-%d %H:%i:
No matter what the load is though, we always get these intermittent spikes that can cause problems. Our CPU load looks like this:
(these are very load-light times, especially the time before 6am, should be basically "idle")
The only solution to the problem I have found so far is setting up a new server, mirroring the data and switching to that one. This usually buys me about 2-3 months, then the spikes start appearing again. Just restarting MySQL or rebooting the server does not change anything.
These are also not caused by cronjobs. Even if I disable all of them, this still happens.
Here is a gist of the InnoDB status right now:
https://gist.github.com/fleshgolem/de1d4a661fb545fabfda
And here is a dump of the server variables:
```
Variable_name Value
auto_increment_increment 1
auto_increment_offset 1
autocommit ON
automatic_sp_privileges ON
back_log 200
basedir /usr
big_tables OFF
bind_address *
binlog_cache_size 32768
binlog_checksum CRC32
binlog_direct_non_transactional_updates OFF
binlog_format ROW
binlog_max_flush_queue_time 0
binlog_order_commits ON
binlog_row_image FULL
binlog_rows_query_log_events OFF
binlog_stmt_cache_size 32768
block_encryption_mode aes-128-ecb
bulk_insert_buffer_size 8388608
character_set_client utf8
character_set_connection utf8
character_set_database latin1
character_set_filesystem binary
character_set_results utf8
character_set_server latin1
character_set_system utf8
character_sets_dir /usr/share/mysql/charsets/
collation_connection utf8_general_ci
collation_database latin1_swedish_ci
collation_server latin1_swedish_ci
completion_type NO_CHAIN
concurrent_insert AUTO
connect_timeout 10
core_file OFF
datadir /var/lib/mysql/
date_format %Y-%m-%d
datetime_format %Y-%m-%d %H:%i:
Solution
This is probably it:
What is happening: a write comes in, the entire GB of QC needs to be scanned to find all instances of that table to purge them. That takes a lot of CPU time. Meanwhile, all
Do not set the size bigger than about 50M, regardless of how much RAM you have. It would probably be wise to also use
Or it may be that the QC is not worth having on at all. This is the common case for Production systems that have constant write traffic.
More...
Based on VARIABLES and GLOBAL STATUS
Observations:
Version: 5.6.19-log
50 GB of RAM
Uptime = 11d 07:07:58
You are not running on Windows.
Running 64-bit version
It appears that you are running both MyISAM and InnoDB.
The More Important Issues
Either convert completely to InnoDB or tweak the cache sizes. SUggest
query_cache_size is really bad at 1000M. Your usage is moderately effective, so consider:
A lot of queries are using tmp tables and, worse, disk tmp tables. Using the slowlog, find out which queries are the most invasive; let's work on them.
Raise tmp_table_size
Details and other observations
( Innodb_buffer_pool_pages_free 16384 / innodb_buffer_pool_size ) = 1,864,255 16384 / 42949672960 = 71.1% -- buffer pool free
-- buffer_pool_size is bigger than working set; could decrease it
( Innodb_log_writes ) = 40,597,718 / 976078 = 42 /sec
( Com_rollback ) = 99,316,489 / 976078 = 101 /sec -- ROLLBACKs in InnoDB.
-- An excessive frequency of rollbacks may indicate inefficient app logic.
( local_infile ) = ON
-- local_infile = ON is a potential security issue
( Key_writes / Key_write_requests ) = 2,200,386 / 4113735 = 53.5% -- key_buffer effectiveness for writes
-- If you have enough RAM, it would be worthwhile to increase key_buffer_size.
( query_cache_size ) = 1000M -- Size of QC
-- Too small = not of much use. Too large = too much overhead. Recommend either 0 or no more than 50M.
( Qcache_not_cached ) = 235,136,200 / 976078 = 240 /sec -- SQL_CACHE attempted, but ignored
-- Rethink caching; tune qcache
( Qcache_inserts - Qcache_queries_in_cache ) = (258097280 - 7736) / 976078 = 264 /sec -- Invalidations/sec.
( (query_cache_size - Qcache_free_memory) / Qcache_queries_in_cache / query_alloc_block_size ) = (1000M - 11519952) / 7736 / 8192 = 16.4 -- query_alloc_block_size vs formula
-- Adjust query_alloc_block_size
( Created_tmp_tables ) = 31,198,441 / 976078 = 32 /sec -- Frequency of creating "temp" tables as part of complex SELECTs.
( Created_tmp_disk_tables ) = 5,996,371 / 976078 = 6.1 /sec -- Frequency of creating disk "temp" tables as part of complex SELECTs
-- increase tmp_table_size and max_heap_table_size.
Check the rules for temp tables being able to use MEMORY instead of MyISAM. It may be possible to make a minor schema or query change to avoid MyISAM.
Better indexes and reformulation of queries are more likely to help.
( Handler_read_rnd_next ) = 1,066,165,445,800 / 976078 = 1092295 /sec -- High if lots of table scans
-- possibly inadequate keys
( Com_rollback / Com_commit ) = 99,316,489 / 49548802 = 200.4% -- Rollback : Commit ratio
-- Rollbacks are costly; change app logic
( Select_scan ) = 16,213,392 / 976078 = 17 /sec -- full table scans
-- Add indexes / optimize queries (unless they are tiny tables)
( Com_insert + Com_delete + Com_delete_multi + Com_replace + Com_update + Com_update_multi ) = (57757683 + 26581027 + 0 + 0 + 34482709 + 0) / 976078 = 121 /sec -- writes/sec
-- 50 writes/sec + log flushes will probably max out I/O write capacity of normal drives
( expire_logs_days ) = 0 -- How soon to automatically purge binlog (after this many days)
-- Too large (or zero) = consumes disk space; too small = need to respond quickly to network/machine crash.
(Not relevant if log_bin = OFF)
( slow_query_log ) = OFF -- Whether to log slow queries. (5.1.12)
( long_query_time ) = 10.000000 = 10 -- Cutoff (Seconds) for defining a "slow" query.
-- Suggest 2
( Aborted_clients / Connections ) = 33,444 / 45497 = 73.5% -- Threads bumped due to timeout
-- Increase wait_timeout; be nice, use disconnect
( Threads_created / Connections ) = 3,675 / 45497 = 8.1% -- Rapidity of process creation
-- Increase thread_cache_size
innodb_log_file_size is small (but hard to change).
Good caching in buffer_pool
query_cache_size 1048576000
query_cache_type ONWhat is happening: a write comes in, the entire GB of QC needs to be scanned to find all instances of that table to purge them. That takes a lot of CPU time. Meanwhile, all
SELECTs are blocked.Do not set the size bigger than about 50M, regardless of how much RAM you have. It would probably be wise to also use
DYNAMIC instead of ON, and hand-pick which SELECTs to have SQL_CACHE and which to have SQL_NO_CACHE.Or it may be that the QC is not worth having on at all. This is the common case for Production systems that have constant write traffic.
More...
Based on VARIABLES and GLOBAL STATUS
Observations:
Version: 5.6.19-log
50 GB of RAM
Uptime = 11d 07:07:58
You are not running on Windows.
Running 64-bit version
It appears that you are running both MyISAM and InnoDB.
The More Important Issues
Either convert completely to InnoDB or tweak the cache sizes. SUggest
innodb_buffer_pool_size = 30G -- you are not using all of the 40G now
key_buffer_size = 2G -- your current 8M is not efficient for writesquery_cache_size is really bad at 1000M. Your usage is moderately effective, so consider:
- Add
SQL_CACHEorSQL_NO_CACHEto allSELECTs, based on which ones are likely to benefit,
- Decrease query_cach_size
to only100m
A lot of queries are using tmp tables and, worse, disk tmp tables. Using the slowlog, find out which queries are the most invasive; let's work on them.
Raise tmp_table_size
and max_heap_table_size from 16M to 32M (but no more). Since there are two ways that tmp tables can turn into 'disk tmp tables', this might prevent some conversions.
slave_skip_errors = ALL` -- Sweeping problems under the rug. Big time!Details and other observations
( Innodb_buffer_pool_pages_free 16384 / innodb_buffer_pool_size ) = 1,864,255 16384 / 42949672960 = 71.1% -- buffer pool free
-- buffer_pool_size is bigger than working set; could decrease it
( Innodb_log_writes ) = 40,597,718 / 976078 = 42 /sec
( Com_rollback ) = 99,316,489 / 976078 = 101 /sec -- ROLLBACKs in InnoDB.
-- An excessive frequency of rollbacks may indicate inefficient app logic.
( local_infile ) = ON
-- local_infile = ON is a potential security issue
( Key_writes / Key_write_requests ) = 2,200,386 / 4113735 = 53.5% -- key_buffer effectiveness for writes
-- If you have enough RAM, it would be worthwhile to increase key_buffer_size.
( query_cache_size ) = 1000M -- Size of QC
-- Too small = not of much use. Too large = too much overhead. Recommend either 0 or no more than 50M.
( Qcache_not_cached ) = 235,136,200 / 976078 = 240 /sec -- SQL_CACHE attempted, but ignored
-- Rethink caching; tune qcache
( Qcache_inserts - Qcache_queries_in_cache ) = (258097280 - 7736) / 976078 = 264 /sec -- Invalidations/sec.
( (query_cache_size - Qcache_free_memory) / Qcache_queries_in_cache / query_alloc_block_size ) = (1000M - 11519952) / 7736 / 8192 = 16.4 -- query_alloc_block_size vs formula
-- Adjust query_alloc_block_size
( Created_tmp_tables ) = 31,198,441 / 976078 = 32 /sec -- Frequency of creating "temp" tables as part of complex SELECTs.
( Created_tmp_disk_tables ) = 5,996,371 / 976078 = 6.1 /sec -- Frequency of creating disk "temp" tables as part of complex SELECTs
-- increase tmp_table_size and max_heap_table_size.
Check the rules for temp tables being able to use MEMORY instead of MyISAM. It may be possible to make a minor schema or query change to avoid MyISAM.
Better indexes and reformulation of queries are more likely to help.
( Handler_read_rnd_next ) = 1,066,165,445,800 / 976078 = 1092295 /sec -- High if lots of table scans
-- possibly inadequate keys
( Com_rollback / Com_commit ) = 99,316,489 / 49548802 = 200.4% -- Rollback : Commit ratio
-- Rollbacks are costly; change app logic
( Select_scan ) = 16,213,392 / 976078 = 17 /sec -- full table scans
-- Add indexes / optimize queries (unless they are tiny tables)
( Com_insert + Com_delete + Com_delete_multi + Com_replace + Com_update + Com_update_multi ) = (57757683 + 26581027 + 0 + 0 + 34482709 + 0) / 976078 = 121 /sec -- writes/sec
-- 50 writes/sec + log flushes will probably max out I/O write capacity of normal drives
( expire_logs_days ) = 0 -- How soon to automatically purge binlog (after this many days)
-- Too large (or zero) = consumes disk space; too small = need to respond quickly to network/machine crash.
(Not relevant if log_bin = OFF)
( slow_query_log ) = OFF -- Whether to log slow queries. (5.1.12)
( long_query_time ) = 10.000000 = 10 -- Cutoff (Seconds) for defining a "slow" query.
-- Suggest 2
( Aborted_clients / Connections ) = 33,444 / 45497 = 73.5% -- Threads bumped due to timeout
-- Increase wait_timeout; be nice, use disconnect
( Threads_created / Connections ) = 3,675 / 45497 = 8.1% -- Rapidity of process creation
-- Increase thread_cache_size
innodb_log_file_size is small (but hard to change).
Good caching in buffer_pool
Code Snippets
query_cache_size 1048576000
query_cache_type ONinnodb_buffer_pool_size = 30G -- you are not using all of the 40G now
key_buffer_size = 2G -- your current 8M is not efficient for writesContext
StackExchange Database Administrators Q#131477, answer score: 2
Revisions (0)
No revisions yet.