HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Query performance with subquery and IN clause

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
withquerysubqueryperformanceandclause

Problem

I am trying to select a range of data for multiple devices (unique serial numbers) from a historical table and was wondering why there is such a big difference in time for the following queries:

Basically I am trying to use the IN clause to indicate what items I want to fetch data for. If I "hard code" the items in the IN clause, the query is fast, if I use a subquery or join to select the items the performance is poor.

This query completes in 0.15s and returns 7382 rows.

SELECT `readings`.* FROM `readings`
WHERE
  (SerialNumber IN ('091146000121', *snip 25*, '091146000556'))
AND (readings.time >= 1325404800)
AND (readings.time < 1326317400)
ORDER BY `time` ASC


The same query rewritten using a subquery to get the serial numbers takes over 30 seconds, and seems to spend most of its time in the Preparing state. It returns the same data as the first query.

SELECT `readings`.* FROM `readings`
WHERE
  (SerialNumber IN (SELECT `boards`.`id` AS `SerialNumber` FROM `boards` WHERE (siteId = '1')))
AND (readings.time >= 1325404800)
AND (readings.time < 1326317400)
ORDER BY `time` ASC


The subquery returns the same values that are in the first query, but as stated, this takes a lot longer to run. Are they not functionally equivalent?

Here is the explain for both queries:

```
+----+-------------+----------+-------+---------------+---------+---------+------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+---------+---------+------ +------+-----------------------------+
| 1 | SIMPLE | readings | range | PRIMARY,time | PRIMARY | 22 | NULL | 7339 | Using where; Using filesort |
+----+-------------+----------+-------+---------------+---------+---------+------+------+-----------------------------+

+----+--------------------+----------+-----------------+----------------+---------+-

Solution

Refactor the query as follows:

SELECT
    readings.*
FROM
    (
        SELECT boxsn FROM readings
        WHERE (time >= 1325404800) 
        AND (time < 1326317400) 
        ORDER BY `time` ASC
    ) readings_keys
    LEFT JOIN
    (
        SELECT id AS boxsn FROM boards WHERE siteId = '1'
    ) boards
    USING (boxsn)
    LEFT JOIN readings
    USING (boxsn)
;


Make sure you have the following indexes:

ALTER TABLE boards ADD INDEX siteId_id_ndx (siteId,id);
ALTER TABLE readings ADD INDEX time_boxsn_ndx (time,boxsn);


You can drop the other index

ALTER TABLE readings DROP INDEX boxsn_time_ndx;


You should definitely see a dramatic improvement in performance as the tables grow.

In your case,

  • The first EXPLAIN plan says you have to perform a lookup of SerialNumber for each row in readings against a list of value in memory



  • The second EXPLAIN plan says you have to perform a lookup of SerialNumber for each row in readings against a table.



UPDATE 2012-01-12 14:03 EDT

I refactored it again to make sure the readings keys and boards keys are combined correctly before retrieving the data from the readings table:

SELECT 
    readings.* 
FROM 
    ( 
        SELECT A.* FROM
        (
            SELECT boxsn FROM readings 
            WHERE (time >= 1325404800)  
            AND (time < 1326317400)  
            ORDER BY `time` ASC
        ) A
        LEFT JOIN
        (
            SELECT id AS boxsn
            FROM boards
            WHERE siteId = '1'
        ) B
        USING (boxsn)
        WHERE B.boxsn IS NOT NULL
    ) readings_keys 
    LEFT JOIN readings 
    USING (boxsn) 
;

Code Snippets

SELECT
    readings.*
FROM
    (
        SELECT boxsn FROM readings
        WHERE (time >= 1325404800) 
        AND (time < 1326317400) 
        ORDER BY `time` ASC
    ) readings_keys
    LEFT JOIN
    (
        SELECT id AS boxsn FROM boards WHERE siteId = '1'
    ) boards
    USING (boxsn)
    LEFT JOIN readings
    USING (boxsn)
;
ALTER TABLE boards ADD INDEX siteId_id_ndx (siteId,id);
ALTER TABLE readings ADD INDEX time_boxsn_ndx (time,boxsn);
ALTER TABLE readings DROP INDEX boxsn_time_ndx;
SELECT 
    readings.* 
FROM 
    ( 
        SELECT A.* FROM
        (
            SELECT boxsn FROM readings 
            WHERE (time >= 1325404800)  
            AND (time < 1326317400)  
            ORDER BY `time` ASC
        ) A
        LEFT JOIN
        (
            SELECT id AS boxsn
            FROM boards
            WHERE siteId = '1'
        ) B
        USING (boxsn)
        WHERE B.boxsn IS NOT NULL
    ) readings_keys 
    LEFT JOIN readings 
    USING (boxsn) 
;

Context

StackExchange Database Administrators Q#10554, answer score: 7

Revisions (0)

No revisions yet.