HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

MySql is not optimizing the query properly

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
theproperlyquerymysqloptimizingnot

Problem

I have a table structure as follows :

CREATE TABLE `sale_product_inventories` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `sale_id` int(11) NOT NULL,
  `product_id` int(11) NOT NULL,
  `size` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
  `tier_number` int(11) NOT NULL DEFAULT '1',
  `sale_product_pool_id` int(11) DEFAULT NULL,
  `inventory` int(11) NOT NULL,
  `in_cart_units` int(11) DEFAULT '0',
  `size_display_order` tinyint(4) NOT NULL DEFAULT '0',
  `last_updated_by` int(11) DEFAULT '0',
  `created_by` int(11) DEFAULT '0',
  `status` enum('active','inactive') COLLATE utf8_unicode_ci NOT NULL DEFAULT 'active',
  `created_at` datetime DEFAULT NULL,
  `updated_at` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `UNIQUE` (`sale_id`,`product_id`,`tier_number`,`size`,`sale_product_pool_id`)
) ENGINE=InnoDB AUTO_INCREMENT=92872 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;


NOTE : I have an Index UNIQUE = sale_id,product_id,tier_number,size,sale_product_pool_id

When I run this query :

select * from sale_product_inventories 
where 
sale_id in (502,504)  and 
(sale_id, product_id) in ((502,2),(502,1), (502,3),(502,4) ,(504,2) ,(504,3) )


MySql Uses the index Unique and the execution time is 0.7 millisecond

BUT

when I run this query

select * from sale_product_inventories 
where 
(sale_id, product_id) in ((502,2),(502,1), (502,3),(502,4) ,(504,2) ,(504,3) )


MySql does not use the UNIQUE index and the execution time is 76 millisecond.

Mysql : 5.5.27 InnoDB Version : 1.1.8

My Question is Why is mysql behaving in such a way. Can some one please help me with this.

EDIT :

I came across this so thought it might be useful to add
MySQL generally can’t use indexes on columns unless the columns are isolated in the query. “Isolating” the column means it should not be part of an expression or be inside a function in the query.

Solution

The MySQL optimizer cannot optimize expressions in this format:

WHERE (col_1,col_2) IN ((a,b),(c,d),(e,f))


It's not a matter of getting the indexes right -- it appears that it's just not implemented.

The optimizer does not understand that this is equivalent to...

WHERE (col_1,col_2) IN ((a,b)) 
   OR (col_1,col_2) IN ((c,d)) 
   OR (col_1,col_2) IN ((e,f))


... or ...

WHERE (col_1 = a AND col_2 = b)
   OR (col_1 = c AND col_2 = d)
   OR (col_1 = e AND col_2 = f)


There's Bug #35819, which I originally found in this article, which was in turn mentioned in the comments on this post.

Unfortunately, I didn't find those until I had already broken out the new Optimizer Trace in MySQL 5.6 and run a few test cases through it. It seemed like a safe bet that if 5.6 couldn't handle it, then previous versions couldn't handle it.

It turns out that MySQL 5.6 indeed cannot handle it. The "set in set of sets" construct seems to simply not be something the optimizer catches on to. So in this case, it's not a matter of the optimizer choosing a full table scan over other plans -- the optimizer actually concludes that there aren't even any other possible plans to consider.

This is only true for multiple "row constructors" on the right hand side of IN. For a single expression, the optimizer does its thing and realizes this is equivalent to col_1 = a AND col_2 = b:

WHERE (col_1,col_2) IN ((a,b))    # is optimized correctly
WHERE (col_1,col_2) IN (ROW(a,b)) # is an equivalent expression in MySQL


Interestingly, your original EXPLAIN suggests the unique index was not being used in exactly the way that you may have believed it was being used, at any rate. It was only being used to find rows with the desired sale_id... not both values.

You'll notice in your original EXPLAIN that the key_len is shown as 4, meaning only the leftmost 4 bytes of the index are going to be examined -- sale_id, a 4-byte INT would be the leftmost 4 bytes in that index. The Using where means the optimizer realizes that additional filtering of the rows returned from the range scan may be needed to eliminate any rows that don't satisfy the rest of the WHERE clause -- all rows with sale_id 502 and 504 are being retrieved via the index regardless of their value for product_id, and then the resulting rows will be subsequently filtered to satisfy the additional constraints imposed by the WHERE.

The optimum path is probably to stick with (expr and expr) or (expr and expr) or (expr and expr) in your where clause. It's logically equivalent and the optimizer understands it.

Additional note, regarding some of your comments... in light of what I've discussed above, index hints will not help, because the optimizer appears unaware of the equivalence of the expression you've used to other expressions that it could handle... but as a point of reference, the reason it was syntactically invalid is that you have to use the name of the index, not the list of columns in the index. You've called your unique index 'UNIQUE', so the way to use this as an index hint would be in this format:

USE INDEX(`UNIQUE`)

Code Snippets

WHERE (col_1,col_2) IN ((a,b),(c,d),(e,f))
WHERE (col_1,col_2) IN ((a,b)) 
   OR (col_1,col_2) IN ((c,d)) 
   OR (col_1,col_2) IN ((e,f))
WHERE (col_1 = a AND col_2 = b)
   OR (col_1 = c AND col_2 = d)
   OR (col_1 = e AND col_2 = f)
WHERE (col_1,col_2) IN ((a,b))    # is optimized correctly
WHERE (col_1,col_2) IN (ROW(a,b)) # is an equivalent expression in MySQL
USE INDEX(`UNIQUE`)

Context

StackExchange Database Administrators Q#42918, answer score: 6

Revisions (0)

No revisions yet.