HiveBrain v1.2.0
Get Started
← Back to all entries
principlesqlMinor

MySQL - NOT IN ... vs LEFT JOIN

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
leftmysqlnotjoin

Problem

I am trying to understand in terms of performance which approach is better:

Query 1:

SELECT * FROM table1
WHERE col1 NOT IN(SELECT col1 FROM table2)


Query 2:

SELECT * FROM table1
LEFT JOIN table2 ON table1.col1 = table2.col1
WHERE table2.col1 IS NULL


The first query uses DEPENDENT SUBQUERY,The second uses simply index.

Solution

EXPLAIN tells you the difference.

If it shows DEPENDENT SUBQUERY it means that the subquery in IN() is executed once-per-row for the table1, that may be really many times if the table is big.

Different MySQL versions may apply different optimizations to the first query to get rid of the DEPENDENT SUBQUERY or at least minimize its performance impact:

  • use subquery cache - may be good if there are many duplicates in col1 values



  • change NOT IN into NOT EXISTS



  • materialize the subquery into a temporary table and (left)join to it



  • rewrite the query to left-join directly



If none of these is applied by your optimizer, use the second query because MySQL optimizes joins better (more so in older versions). Having a plan with DEPENDENT SUBQUERY in it is not effective in most cases.

IIRC the LEFT JOIN ... IS NULL may be evaluated as NOT EXISTS with specific optimization for that plan.

For both queries you should have an index on table2.col1 to allow fast execution of the subquery or fast detection of (non) existence of the row.

Context

StackExchange Database Administrators Q#136586, answer score: 5

Revisions (0)

No revisions yet.