HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Simple query is slow on 4M-rows table

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
rowssimplequeryslowtable

Problem

I have in production a MySQL table pageviews with 4M rows that records page views of users on posts. I need to know which posts a specific user have read, but this request takes up to 15 seconds to execute:

SELECT post_id
FROM pageviews
WHERE user_id = 981
GROUP BY post_id


Here is the execution plan:

mysql> EXPLAIN SELECT post_id FROM visits WHERE user_id = 981 GROUP BY post_id;
+----+-------------+--------+------+---------------+---------+---------+-------+-------+----------------------------------------------+
| id | select_type | table  | type | possible_keys | key     | key_len | ref   | rows  | Extra                                        |
+----+-------------+--------+------+---------------+---------+---------+-------+-------+----------------------------------------------+
|  1 | SIMPLE      | visits | ref  | user_id       | user_id | 5       | const | 54696 | Using where; Using temporary; Using filesort |
+----+-------------+--------+------+---------------+---------+---------+-------+-------+----------------------------------------------+


I'm not sure how to look for the cause of the slowness: maybe the table is not well configured, the mysql server not well tuned, other queries locking stuff, ... Or maybe just 4M rows is a good size to start partitioning.

Production database is on Amazon RDS

CREATE TABLE pageviews (
id int(11) NOT NULL AUTO_INCREMENT,
user_id int(11) DEFAULT NULL,
post_id int(11) DEFAULT NULL,
created_at datetime NOT NULL,
PRIMARY KEY (id),
KEY post_id (post_id),
KEY user_id (user_id),
KEY created_at (created_at),
CONSTRAINT FK_444839EAA76ED395 FOREIGN KEY (user_id) REFERENCES users (id),
CONSTRAINT visits_ibfk_2 FOREIGN KEY (post_id) REFERENCES posts (id)
) ENGINE=InnoDB AUTO_INCREMENT=4587432 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

Solution

In addition to @tombom's suggestions, creating an index on (user_id, post_id) instead of (or in addition, but the less indexes the better) separate indexes on user_id and post_id will simplify the query, probably getting rid of the filesort and temporary table, plus giving you the benefits of a covering index.

This will probably lower the query execution significantly if you have a large enough buffer pool and the query is relatively frequent.

If after doing that, the query is still slow, you will need to do (pre)caching in order to speed up the query execution.

Context

StackExchange Database Administrators Q#82657, answer score: 8

Revisions (0)

No revisions yet.