HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Speeding up a NOT IN subquery

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
speedingsubquerynot

Problem

Good evening,

I'm currently up to try to improve the performance of some of my queries.
As far as I know, statements like "IN" or "NOT IN" are even faster with a large amount of values if a subquery on an indexed field without conditions is used.

SELECT * FROM table1 WHERE field1 NOT IN (SELECT index_field FROM table2)


When it comes to use conditions at table2, the query becomes realy slow on a large amount of data.

SELECT *
FROM table1
WHERE
    field1 NOT IN (
        SELECT
            index_field
        FROM table2
        WHERE
            user_id = '2'
    )


I was thinking about using LEFT Join but when I need to to the filtering for the user_id, it also becomes slow.

How may I solve this problem? Currently, I've no clue about that.

The original query looked sth. like this

SELECT
    i.*
FROM stream_item si
LEFT JOIN user__publisher_item ui
    ON ui.user_id = 2
    && ui.item_id = si.item_id
INNER JOIN item i
    ON i.id = si.item_id
WHERE 
    si.stream_id IN (5,7,8,9,19,24,29,42,43,44)
    && ui.id IS NULL || (
        ui.`read` = 0 || ui.saved = 1
    )
GROUP BY
    si.item_id
ORDER BY
    si.`found` DESC
LIMIT 0,10


```
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE si index PRIMARY,stream_2_item,stream_id_found stream_2_item 4 \N 663236 Using temporary; Using filesort
1 SIMPLE ui eq_ref user_id_item_id,user_2_item,user_id_read_saved_hidden,user_id_saved,user_id_hidden user_id_item_id 8 const,si.item_id 1 Using where
1 SIMPLE i eq_ref PRIMARY PRIMARY 4 si.item_id

Solution

In general you can replace a NOT IN (...) subquery with a NOT EXISTS (...) and the latter will be faster. Logically they are the same though some database optimizers handle the NOT EXISTS better as they can stop the subquery check after the first match.

In contrast, if the DB optimizer isn't smart enough to figure out what your doing and the subquery has a lot of rows, then it must buffer all the rows for each subquery before performing each NOT IN check.

Try out something like this:

SELECT *
FROM table1 a
WHERE NOT EXISTS(SELECT 1
                   FROM table2 b
                  WHERE b.user_id = 2
                    AND b.index_field = a.field1)


In this example the optimizer need only check if a single row exists for the matching subquery, including the additional filter on b.user_id = 2. It should use an index on the index_field and user_id if they are available.

Code Snippets

SELECT *
FROM table1 a
WHERE NOT EXISTS(SELECT 1
                   FROM table2 b
                  WHERE b.user_id = 2
                    AND b.index_field = a.field1)

Context

StackExchange Database Administrators Q#46702, answer score: 3

Revisions (0)

No revisions yet.