HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

How can I improve the speed of a query on a 20 million+ row table?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
millioncanthequeryimprovehowrowspeedtable

Problem

I have a query that is used for getting internet traffic statistics of certain IP addresses.

There are separate IP address fields for hosts and blocks of IPs called assignments. The data is stored in 5 minute intervals.

The query results are grouped on the time column, and the total SUMs in and out of these 5 minute intervals are used to plot a graph.

The table is called traffic and contains (at the end of the month) around 21 million records.

SHOW CREATE table traffic:
CREATE TABLE
traffic (
type enum('v4_assignment','v4_host','v6_subnet','v6_assignment','v6_host') NOT NULL,
type_id int(11) unsigned NOT NULL,
time int(32) unsigned NOT NULL,
bytesin bigint(20) unsigned NOT NULL default '0',
bytesout bigint(20) unsigned NOT NULL default '0',
KEY
basic_select (type_id,time,type)
) ENGINE=InnoDB DEFAULT CHARSET=latin1


SELECT traffic.time, SUM(traffic.bytesin), SUM(traffic.bytesout) FROM traffic
WHERE (
( traffic.type = 'v4_assignment' AND type_id IN (231, between 20 to 100 ids,265)) OR
( traffic.type = 'v4_host' AND type_id IN (131, ... a lot of ids... ,1506)))
AND traffic.time >= 1343772000 AND traffic.time

The following is
explain output for the query above:

``
+----+-------------+---------+-------+---------------+--------------+---------+------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+---------------+--------------+---------+------+--------+----------------------------------------------+
| 1 | SIMPLE | traffic | range | basic_select | basic_select | 8 | NULL | 891319 | Using where; Using temporary; Using filesort |
+----+-------------+---------+-------+---------------+--------------+---------+------+--------+----------------------------------------------+

show indexes from traffic;
+-----

Solution


  1. Table partitioning



Because of the [AND traffic.time >= 1343772000 AND traffic.time

  • You will need to calculate the timestamp ranges for that, but it should not be hard to do.



  • Ex: (1346450399 - 1343772000) / 60 / 60 / 24 =~ 31 days. So the max value for the partition holding the data for september (has also 31 days) will be:


1346450399 + ( 31 24 60 * 60)

  • An unix to date calculator can be found here: http://www.onlineconversion.com/unix_time.htm



  1. Rewrite the query



Because the "OR" in your WHERE block, the optimizer will choose not to use the index defined.
Try to split the query in 2 selects, and make an union.

SELECT 
    traffic.time, 
    SUM(traffic.bytesin), 
    SUM(traffic.bytesout) 
FROM 
    traffic 
WHERE traffic.type LIKE 'v4_assignment' 
    AND type_id IN (1,2,3,4)
    AND traffic.time >= 1343772000 AND traffic.time = 1343772000 AND traffic.time <= 1346450399 
GROUP BY 
    traffic.time
ORDER BY 
    traffic.time


  1. New index based on data cardinality



Based on your explain output, I don't see the index beeing used.
Maybe because the optimizer decides that it will be easier (cheaper) to make a full table scan then to follow the index.
Also, in your current index, the first column has a lower cardinality then the next 2 ones. The first column in any index should be the column with the best (max) cardinality.

Create a new index as:

MYSQL> CREATE INDEX MTIhai_traffic_idx1 ON traffic(time, type, type_id)

Code Snippets

SELECT 
    traffic.time, 
    SUM(traffic.bytesin), 
    SUM(traffic.bytesout) 
FROM 
    traffic 
WHERE traffic.type LIKE 'v4_assignment' 
    AND type_id IN (1,2,3,4)
    AND traffic.time >= 1343772000 AND traffic.time <= 1346450399 
GROUP BY 
    traffic.time
UNION
SELECT 
    traffic.time, 
    SUM(traffic.bytesin), 
    SUM(traffic.bytesout) 
FROM 
    traffic 
WHERE traffic.type LIKE 'v4_host' 
    AND type_id IN (5,6,7,8)
    AND traffic.time >= 1343772000 AND traffic.time <= 1346450399 
GROUP BY 
    traffic.time
ORDER BY 
    traffic.time
MYSQL> CREATE INDEX MTIhai_traffic_idx1 ON traffic(time, type, type_id)

Context

StackExchange Database Administrators Q#22979, answer score: 6

Revisions (0)

No revisions yet.