snippetsqlMinor
How can I improve the speed of a query on a 20 million+ row table?
Viewed 0 times
millioncanthequeryimprovehowrowspeedtable
Problem
I have a query that is used for getting internet traffic statistics of certain IP addresses.
There are separate IP address fields for
The query results are grouped on the time column, and the total SUMs in and out of these 5 minute intervals are used to plot a graph.
The table is called
+----+-------------+---------+-------+---------------+--------------+---------+------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+---------------+--------------+---------+------+--------+----------------------------------------------+
| 1 | SIMPLE | traffic | range | basic_select | basic_select | 8 | NULL | 891319 | Using where; Using temporary; Using filesort |
+----+-------------+---------+-------+---------------+--------------+---------+------+--------+----------------------------------------------+
show indexes from traffic;
+-----
There are separate IP address fields for
hosts and blocks of IPs called assignments. The data is stored in 5 minute intervals. The query results are grouped on the time column, and the total SUMs in and out of these 5 minute intervals are used to plot a graph.
The table is called
traffic and contains (at the end of the month) around 21 million records. SHOW CREATE table traffic:
CREATE TABLE traffic (
type enum('v4_assignment','v4_host','v6_subnet','v6_assignment','v6_host') NOT NULL,
type_id int(11) unsigned NOT NULL,
time int(32) unsigned NOT NULL,
bytesin bigint(20) unsigned NOT NULL default '0',
bytesout bigint(20) unsigned NOT NULL default '0',
KEY basic_select (type_id,time,type)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
SELECT traffic.time, SUM(traffic.bytesin), SUM(traffic.bytesout) FROM traffic
WHERE (
( traffic.type = 'v4_assignment' AND type_id IN (231, between 20 to 100 ids,265)) OR
( traffic.type = 'v4_host' AND type_id IN (131, ... a lot of ids... ,1506)))
AND traffic.time >= 1343772000 AND traffic.time
The following is explain output for the query above:
``+----+-------------+---------+-------+---------------+--------------+---------+------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+---------------+--------------+---------+------+--------+----------------------------------------------+
| 1 | SIMPLE | traffic | range | basic_select | basic_select | 8 | NULL | 891319 | Using where; Using temporary; Using filesort |
+----+-------------+---------+-------+---------------+--------------+---------+------+--------+----------------------------------------------+
show indexes from traffic;
+-----
Solution
- Table partitioning
Because of the [AND traffic.time >= 1343772000 AND traffic.time
- You will need to calculate the timestamp ranges for that, but it should not be hard to do.
- Ex: (1346450399 - 1343772000) / 60 / 60 / 24 =~ 31 days. So the max value for the partition holding the data for september (has also 31 days) will be:
1346450399 + ( 31 24 60 * 60)
- An unix to date calculator can be found here: http://www.onlineconversion.com/unix_time.htm
- Rewrite the query
Because the "OR" in your WHERE block, the optimizer will choose not to use the index defined.
Try to split the query in 2 selects, and make an union.
SELECT
traffic.time,
SUM(traffic.bytesin),
SUM(traffic.bytesout)
FROM
traffic
WHERE traffic.type LIKE 'v4_assignment'
AND type_id IN (1,2,3,4)
AND traffic.time >= 1343772000 AND traffic.time = 1343772000 AND traffic.time <= 1346450399
GROUP BY
traffic.time
ORDER BY
traffic.time- New index based on data cardinality
Based on your explain output, I don't see the index beeing used.
Maybe because the optimizer decides that it will be easier (cheaper) to make a full table scan then to follow the index.
Also, in your current index, the first column has a lower cardinality then the next 2 ones. The first column in any index should be the column with the best (max) cardinality.
Create a new index as:
MYSQL> CREATE INDEX MTIhai_traffic_idx1 ON traffic(time, type, type_id)Code Snippets
SELECT
traffic.time,
SUM(traffic.bytesin),
SUM(traffic.bytesout)
FROM
traffic
WHERE traffic.type LIKE 'v4_assignment'
AND type_id IN (1,2,3,4)
AND traffic.time >= 1343772000 AND traffic.time <= 1346450399
GROUP BY
traffic.time
UNION
SELECT
traffic.time,
SUM(traffic.bytesin),
SUM(traffic.bytesout)
FROM
traffic
WHERE traffic.type LIKE 'v4_host'
AND type_id IN (5,6,7,8)
AND traffic.time >= 1343772000 AND traffic.time <= 1346450399
GROUP BY
traffic.time
ORDER BY
traffic.timeMYSQL> CREATE INDEX MTIhai_traffic_idx1 ON traffic(time, type, type_id)Context
StackExchange Database Administrators Q#22979, answer score: 6
Revisions (0)
No revisions yet.