snippetsqlMajor
How to speed up queries on a large 220 million rows table (9 gig data)?
Viewed 0 times
millionrowsdatalargehowqueriesgigspeedtable220
Problem
The issue:
We have a social site where members can rate each other for compatibility or matching. This
MySQL version:
Table info:
Gives:
Sample query:
```
select * from m
We have a social site where members can rate each other for compatibility or matching. This
user_match_ratings table contains over 220 million rows (9 gig data or almost 20 gig in indexes). Queries against this table routinely show up in slow.log (threshold > 2 seconds) and is the most frequently logged slow query in the system:Query_time: 3 Lock_time: 0 Rows_sent: 3 Rows_examined: 1051
"select rating, count(*) as tally from user_match_ratings where rated_user_id = 395357 group by rating;"
Query_time: 4 Lock_time: 0 Rows_sent: 3 Rows_examined: 1294
"select rating, count(*) as tally from user_match_ratings where rated_user_id = 4182969 group by rating;"
Query_time: 3 Lock_time: 0 Rows_sent: 3 Rows_examined: 446
"select rating, count(*) as tally from user_match_ratings where rated_user_id = 630148 group by rating;"
Query_time: 5 Lock_time: 0 Rows_sent: 3 Rows_examined: 3788
"select rating, count(*) as tally from user_match_ratings where rated_user_id = 1835698 group by rating;"
Query_time: 17 Lock_time: 0 Rows_sent: 3 Rows_examined: 4311
"select rating, count(*) as tally from user_match_ratings where rated_user_id = 1269322 group by rating;"MySQL version:
- protocol version: 10
- version: 5.0.77-log
- version bdb: Sleepycat Software: Berkeley DB 4.1.24: (January 29, 2009)
- version compile machine: x86_64 version_compile_os:redhat-linux-gnu
Table info:
SHOW COLUMNS FROM user_match_ratings;Gives:
╔═══════════════╦════════════╦════╦═════╦════════╦════════════════╗
║ id ║ int(11) ║ NO ║ PRI ║ NULL ║ auto_increment ║
║ rater_user_id ║ int(11) ║ NO ║ MUL ║ NULL ║ ║
║ rated_user_id ║ int(11) ║ NO ║ MUL ║ NULL ║ ║
║ rating ║ varchar(1) ║ NO ║ ║ NULL ║ ║
║ created_at ║ datetime ║ NO ║ ║ NULL ║ ║
╚═══════════════╩════════════╩════╩═════╩════════╩════════════════╝Sample query:
```
select * from m
Solution
Thoughts on the issue, thrown in random order:
-
The obvious index for this query is:
-
Before starting adding numerous indexes in the tables, try to analyze the performance of the whole database, the whole set of slow queries, examine again the choices of the datatypes, the engine you use and the configuration settings.
-
Consider moving to a newer version of MySQL, 5.1, 5.5 or even 5.6 (also: Percona and MariaDB versions.) Several benefits as bugs have been corrected, the optimizer improved and you can set the low threshold for slow queries to less than 1 second (like 10 milliseconds). This will give you far better info about slow queries.
-
The choice for the datatype of
-
The obvious index for this query is:
(rated_user_id, rating). A query that gets data for only one of the million users and needs 17 seconds is doing something wrong: reading from the (rated_user_id, rater_user_id) index and then reading from the table the (hundreds to thousands) values for the rating column, as rating is not in any index. So, the query has to read many rows of the table which are located in many different disk locations.-
Before starting adding numerous indexes in the tables, try to analyze the performance of the whole database, the whole set of slow queries, examine again the choices of the datatypes, the engine you use and the configuration settings.
-
Consider moving to a newer version of MySQL, 5.1, 5.5 or even 5.6 (also: Percona and MariaDB versions.) Several benefits as bugs have been corrected, the optimizer improved and you can set the low threshold for slow queries to less than 1 second (like 10 milliseconds). This will give you far better info about slow queries.
-
The choice for the datatype of
rating is weird. VARCHAR(1)? Why not CHAR(1)? Why not TINYINT? This will save you some space, both tin the table and in the indexes that (will) include that column. A varchar(1) column needs one more byte over char(1) and if they are utf8, the (var)char columns will need 3 (or 4) bytes, instead of 1 (tinyint).Context
StackExchange Database Administrators Q#39693, answer score: 38
Revisions (0)
No revisions yet.