HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

MySQL: How to optimize a certain SELECT statement which causes very high load?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
statementhighmysqloptimizehowwhichselectcertainloadvery

Problem

Having a table with 25.000.000 entries, I have the following query which is started every ~ every 2 seconds which causes very high loads (up to 40 AAS in Amazon AWS). The execution takes 20 seconds up to 5 minutes, which even causes timeouts in user browsers and a high drop rate.

SELECT COUNT ( * ) AS `chk` 
  FROM ( SELECT `item_id` 
           FROM `items` 
          WHERE `item_status` IN (...) 
            AND `item_type` = ? 
            AND `user_id` != ? 
            AND `item_name` IN (...) 
          LIMIT 3 
       ) AS OTHERS
;


Index optimization is already done - user_id, item_name, item_type and item_status are all indexed (one single index per column).

More information:

  • One user has 1 - 1.000.000 Million entries



  • item_name is a varchar 128



  • item_type has a cardinality of 7



  • item_status has also a cardinality of 7



  • Just need to know if there are three or more matches



Please note that in ~ 50% of the cases MySQL has to check the complete table, because less then 3 items are found. So the limit only helps in case there are MORE than 3 items.

While I do Redis caching for certain results, for this query it is not possible because an exact result is always needed. With the database growing with ~ 1 entry per second, the query performance is getting worse pretty fast.

While I could solve most problems with indexing in the past, I have a serious problem here. Thought about solving the problem with some trigger or a view, but I am not sure if that helps? Doing such a change without knowing if it solves the problem is dangerous in a highly active production database.

My question to the pros here is: How can I solve that problem with MySQL?

Solution

You could try using a composite index.

Your index could look like this:

INDEX (item_type, item_name, item_status, user_id)


This index should allow MySQL to find all rows efficiently and without having to lookup any rows from the actual table.

The order of the columns is important here:

  • item_type comes first since you have an equality filter on the column.



  • item_name and item_status come as second and third since both use an IN. The order of the columns could be changed but since item_name probably as a higher cardinality than item_status having it first may be a bit faster.



  • user_id comes last since it uses an inequality filter which can not be efficiently handled with an index.



In general MySQL will only use 1 index per table¹, so having more than 1 index does not help.

Also MySQL will not use an index if it does not look like the index will filter enough rows (with enough being calculated by the query planer based on table an index statistics). The reason for this is that when using indexes MySQL still needs to look up the original row in the table (unless you have a covering index²) which is somewhat expensive, so using an index to look up a lot of rows can actually be slower than doing a full table scan.

Given the number of rows in your table, most if not all of your 4 existing indexes will probably have too many rows per value and won't even be considered by MySQL for this query. Though this really depends on the data distribution and cardinality.

You can check which indexes are used and if they are covering or not using EXPLAIN. For example

EXPLAIN SELECT COUNT ( * ) AS `chk` FROM ( SELECT `item_id` FROM `items` WHERE `item_status` IN (...) AND `item_type` = ? AND `user_id` != ? AND `item_name` IN (...) LIMIT 3 ) AS OTHERS;


If you use MySQL 8 you can also use EXPLAIN ANALYZE to get an actual understanding on how MySQL reads and filters the data with timings.

Rick James has written a great document on how to build the best index for a SELECT which I recommend for you to read. He also wrote a document about compound (composite) indexes which I can also recommend.

In the future please try to also provide at least the table schema and indexes (you can use SHOW CREATE TABLE ? for this) and the output of running the problematic query with EXPLAIN as well as the MySQL version. This reduces the amount of guessing we have to do and allows us to give more concrete and better answers.

¹ There is an optimization where MySQL can use more than 1 index called Index Merge Optimization, but it only works in some rare cases and should be avoided.

² A covering index is an index that contains all columns used in a query. Since all columns are already in the index, MySQL does not need to fetch other columns from the table itself. The index I proposed in this answer is an example of a covering index.

Code Snippets

INDEX (item_type, item_name, item_status, user_id)
EXPLAIN SELECT COUNT ( * ) AS `chk` FROM ( SELECT `item_id` FROM `items` WHERE `item_status` IN (...) AND `item_type` = ? AND `user_id` != ? AND `item_name` IN (...) LIMIT 3 ) AS OTHERS;

Context

StackExchange Database Administrators Q#314235, answer score: 9

Revisions (0)

No revisions yet.