HiveBrain v1.2.0
Get Started
← Back to all entries
gotchasqlMinor

Why does IN (subquery) perform bad when = (subquery) is blazing fast?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
fastwhybadblazingsubqueryperformdoeswhen

Problem

Have a look at these two queries:

Simple:
mysql> SELECT * FROM omgbatch JOIN omgoutput ON (idomgbatch=omgbatch) WHERE idomgoutput = (SELECT DISTINCT lastoutput FROM omgenvelope WHERE lastinput=18658);
+------------+-------+--------------------+-------------+---------+----------+
| idomgbatch | rutin | batchtime | idomgoutput | flowmod | omgbatch |
+------------+-------+--------------------+-------------+---------+----------+
| 12174 | 8 | 20140508040930-832 | 10728 | 162 | 12174 |
+------------+-------+--------------------+-------------+---------+----------+
1 row in set (0.00 sec)

and

Complex:
mysql> SELECT * FROM omgbatch JOIN omgoutput ON (idomgbatch=omgbatch) WHERE idomgoutput IN (SELECT DISTINCT lastoutput FROM omgenvelope WHERE lastinput=18658);
+------------+-------+--------------------+-------------+---------+----------+
| idomgbatch | rutin | batchtime | idomgoutput | flowmod | omgbatch |
+------------+-------+--------------------+-------------+---------+----------+
| 12174 | 8 | 20140508040930-832 | 10728 | 162 | 12174 |
+------------+-------+--------------------+-------------+---------+----------+
1 row in set (7.40 sec)

In this case I knew that the subquery returns only 1 value (12174) so I could use the = operator for the subquery result set and get the answer in 0.00 seconds.

When I keep the IN operator, it takes a whole 7.4 seconds.
I have changed my code into a loop for the subquery and a simple = in the "main" select, and all is well, but I'd still like to understand the mechanics behind this.

I'm too novice to understand all the details from the EXPLAIN command.

The query with IN lists three rows with select_type PRIMARY, PRIMARY, and DEPENDENT SUBQUERY, with corresponding row counts 10172, 1, and 721. Corresponding "keys": NULL, PRIMARY, fk_omgenvelope_omginput1_idx.

The = version that can only match 1 value instead does select_types PRIMARY, `P

Solution

According to this answer:

This is a known problem in MySQL. Rumors say it will be fixed.

The problem is due to a missing optimization when a subquery using IN is incorrectly indentified as dependent subquery instead of an independent subquery.

This causes the subquery to be run a huge number of times, which is unnecessary.

Context

StackExchange Database Administrators Q#64762, answer score: 3

Revisions (0)

No revisions yet.