patternsqlMinor
Query performance with subquery and IN clause
Viewed 0 times
withquerysubqueryperformanceandclause
Problem
I am trying to select a range of data for multiple devices (unique serial numbers) from a historical table and was wondering why there is such a big difference in time for the following queries:
Basically I am trying to use the IN clause to indicate what items I want to fetch data for. If I "hard code" the items in the IN clause, the query is fast, if I use a subquery or join to select the items the performance is poor.
This query completes in 0.15s and returns 7382 rows.
The same query rewritten using a subquery to get the serial numbers takes over 30 seconds, and seems to spend most of its time in the Preparing state. It returns the same data as the first query.
The subquery returns the same values that are in the first query, but as stated, this takes a lot longer to run. Are they not functionally equivalent?
Here is the explain for both queries:
```
+----+-------------+----------+-------+---------------+---------+---------+------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+---------+---------+------ +------+-----------------------------+
| 1 | SIMPLE | readings | range | PRIMARY,time | PRIMARY | 22 | NULL | 7339 | Using where; Using filesort |
+----+-------------+----------+-------+---------------+---------+---------+------+------+-----------------------------+
+----+--------------------+----------+-----------------+----------------+---------+-
Basically I am trying to use the IN clause to indicate what items I want to fetch data for. If I "hard code" the items in the IN clause, the query is fast, if I use a subquery or join to select the items the performance is poor.
This query completes in 0.15s and returns 7382 rows.
SELECT `readings`.* FROM `readings`
WHERE
(SerialNumber IN ('091146000121', *snip 25*, '091146000556'))
AND (readings.time >= 1325404800)
AND (readings.time < 1326317400)
ORDER BY `time` ASCThe same query rewritten using a subquery to get the serial numbers takes over 30 seconds, and seems to spend most of its time in the Preparing state. It returns the same data as the first query.
SELECT `readings`.* FROM `readings`
WHERE
(SerialNumber IN (SELECT `boards`.`id` AS `SerialNumber` FROM `boards` WHERE (siteId = '1')))
AND (readings.time >= 1325404800)
AND (readings.time < 1326317400)
ORDER BY `time` ASCThe subquery returns the same values that are in the first query, but as stated, this takes a lot longer to run. Are they not functionally equivalent?
Here is the explain for both queries:
```
+----+-------------+----------+-------+---------------+---------+---------+------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+---------+---------+------ +------+-----------------------------+
| 1 | SIMPLE | readings | range | PRIMARY,time | PRIMARY | 22 | NULL | 7339 | Using where; Using filesort |
+----+-------------+----------+-------+---------------+---------+---------+------+------+-----------------------------+
+----+--------------------+----------+-----------------+----------------+---------+-
Solution
Refactor the query as follows:
Make sure you have the following indexes:
You can drop the other index
You should definitely see a dramatic improvement in performance as the tables grow.
In your case,
UPDATE 2012-01-12 14:03 EDT
I refactored it again to make sure the
SELECT
readings.*
FROM
(
SELECT boxsn FROM readings
WHERE (time >= 1325404800)
AND (time < 1326317400)
ORDER BY `time` ASC
) readings_keys
LEFT JOIN
(
SELECT id AS boxsn FROM boards WHERE siteId = '1'
) boards
USING (boxsn)
LEFT JOIN readings
USING (boxsn)
;Make sure you have the following indexes:
ALTER TABLE boards ADD INDEX siteId_id_ndx (siteId,id);
ALTER TABLE readings ADD INDEX time_boxsn_ndx (time,boxsn);You can drop the other index
ALTER TABLE readings DROP INDEX boxsn_time_ndx;You should definitely see a dramatic improvement in performance as the tables grow.
In your case,
- The first EXPLAIN plan says you have to perform a lookup of SerialNumber for each row in
readingsagainst a list of value in memory
- The second EXPLAIN plan says you have to perform a lookup of SerialNumber for each row in
readingsagainst a table.
UPDATE 2012-01-12 14:03 EDT
I refactored it again to make sure the
readings keys and boards keys are combined correctly before retrieving the data from the readings table:SELECT
readings.*
FROM
(
SELECT A.* FROM
(
SELECT boxsn FROM readings
WHERE (time >= 1325404800)
AND (time < 1326317400)
ORDER BY `time` ASC
) A
LEFT JOIN
(
SELECT id AS boxsn
FROM boards
WHERE siteId = '1'
) B
USING (boxsn)
WHERE B.boxsn IS NOT NULL
) readings_keys
LEFT JOIN readings
USING (boxsn)
;Code Snippets
SELECT
readings.*
FROM
(
SELECT boxsn FROM readings
WHERE (time >= 1325404800)
AND (time < 1326317400)
ORDER BY `time` ASC
) readings_keys
LEFT JOIN
(
SELECT id AS boxsn FROM boards WHERE siteId = '1'
) boards
USING (boxsn)
LEFT JOIN readings
USING (boxsn)
;ALTER TABLE boards ADD INDEX siteId_id_ndx (siteId,id);
ALTER TABLE readings ADD INDEX time_boxsn_ndx (time,boxsn);ALTER TABLE readings DROP INDEX boxsn_time_ndx;SELECT
readings.*
FROM
(
SELECT A.* FROM
(
SELECT boxsn FROM readings
WHERE (time >= 1325404800)
AND (time < 1326317400)
ORDER BY `time` ASC
) A
LEFT JOIN
(
SELECT id AS boxsn
FROM boards
WHERE siteId = '1'
) B
USING (boxsn)
WHERE B.boxsn IS NOT NULL
) readings_keys
LEFT JOIN readings
USING (boxsn)
;Context
StackExchange Database Administrators Q#10554, answer score: 7
Revisions (0)
No revisions yet.