patternsqlMinor
Mysql on RDS avoiding index_merge
Viewed 0 times
rdsmysqlavoidingindex_merge
Problem
I'm somewhat experienced with mysql optimizations and routines but something caught me recently.
I was using an on-premise mysql installation over ec2 instances and just migrated to RDS so I can sleep tight without worrying.
Problem is that my application has some legacy issues and was previously running on mysql 5.1 and it's now running on mysql 5.6.
Our database was built upon the premise that index_merge will be used by mysql, so our indexes are not composite; they all affect single columns; I know how bad this might be and how bad it is, but it's been working fine so far; I can't really change it right now because we have over 300 tables.
What got me into trouble is that one of my tables controls inventory stock. The table has the following structure:
and a few other columns that won't really matter.
When I run a select query to obtain the last price for a given product, like this:
What I expected was that MySQL would merge (most likely) the productId and companyId indexes, that are very specific, and read about 4 lines; but what actually happened is that MySQL decided to iterate over 5 million lines by sorting the transactionDate and not merging it to any other index.
I decided to dig a bit further:
I was using an on-premise mysql installation over ec2 instances and just migrated to RDS so I can sleep tight without worrying.
Problem is that my application has some legacy issues and was previously running on mysql 5.1 and it's now running on mysql 5.6.
Our database was built upon the premise that index_merge will be used by mysql, so our indexes are not composite; they all affect single columns; I know how bad this might be and how bad it is, but it's been working fine so far; I can't really change it right now because we have over 300 tables.
What got me into trouble is that one of my tables controls inventory stock. The table has the following structure:
id (PK)
companyId (btree index)
productId (btree index)
transactionType (input, output or stock balance; btree index)
transactionDate (btree index)
transactionPriceand a few other columns that won't really matter.
When I run a select query to obtain the last price for a given product, like this:
SELECT
transactionPrice
FROM
stock
WHERE
productId = x
AND transactionType = 'input'
AND companyId = y
AND transactionDate < '2017-07-07'
ORDER BY transactionDate DESC
LIMIT 1;What I expected was that MySQL would merge (most likely) the productId and companyId indexes, that are very specific, and read about 4 lines; but what actually happened is that MySQL decided to iterate over 5 million lines by sorting the transactionDate and not merging it to any other index.
I decided to dig a bit further:
SHOW @@optimizer_switchshows me that index_merge flags are all ON
- Ran the query hinting the optimizer to
IGNOREthe transactionDate index; id did decide to index_merge productId and companyId as expected!
- Created a new compound index for this specific table that aggregated all fields and it is now being used, so my problem is partially solved, as this is
Solution
I have found over the years that MySQL very rarely uses Index Merge. I think that part of the reason is that it is so inefficient. In order to perform it, it must
Even when Index Merge could be used, it is essentially guaranteed to be slower than a suitable composite index.
For the query in question, this index is optimal, even handling the
On a related topic... I assume this is a very large table? How big is the buffer_pool? Is it I/O-bound, or fully cached?
I ask these because, if it is I/O-bound, the choice of the
More useful info: How many different products? transactionTypes - apparently 3? companies? Are they evenly distributed?
- Scan a potentially large part of one index, collecting the results somewhere.
- Scan a potentially large part of another index, collecting the results somewhere.
- "Merge" the results into a smaller result set (in the case of "intersect", which seems to be your case).
- Then go look up the rows.
Even when Index Merge could be used, it is essentially guaranteed to be slower than a suitable composite index.
For the query in question, this index is optimal, even handling the
ORDER BY (thereby avoiding a tmp and sort), which I don't think Index Merge can do:INDEX(productId, transactionType, companyId, -- in any order
transactionDate) -- lastOn a related topic... I assume this is a very large table? How big is the buffer_pool? Is it I/O-bound, or fully cached?
I ask these because, if it is I/O-bound, the choice of the
PRIMARY becomes important in performance. Would you care to show usSHOW CREATE TABLE`; I will explain further. Based on the info you provided, I would guess that the rows needed for your query are scattered across the table, and not 'clustered' in a few blocks.More useful info: How many different products? transactionTypes - apparently 3? companies? Are they evenly distributed?
Code Snippets
INDEX(productId, transactionType, companyId, -- in any order
transactionDate) -- lastContext
StackExchange Database Administrators Q#178286, answer score: 2
Revisions (0)
No revisions yet.