patternsqlMinor
Improving join query performance with limit and index
Viewed 0 times
withlimitqueryjoinperformanceandindeximproving
Problem
I have a query across two large tables. The first records the latest user activity at a location, the second is a dimension table with a natural primary key of location.
Table sizes here are roughly 100 millions rows in
Most users have < 1000 records in
For users with a lot of activity data, this query, even though it is two simple selects, can still be slow.
I would like to improve the query performance. Can this by done by adding an additional index?
As an alternative, is there a way to exploit an index to make a more limited query (such as below) more efficient than the full query?
EXPLAIN SELECT yields the following on these queries (for a user with 999 events, say)
```
+----+-------------+----------------------+--------+-------------------------------+-----------+---------+----------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+--------+-------------------------------+-----------+---------+----------------------------------+------+-------------+
| 1 | SIMPLE | user_location_rating | ref | k_userloc,k_locplat,k_usrplat | k_usrplat | 8 | const,const | 999 | Using index |
| 1 | SIMPLE | dim_location | eq_ref | PRIMARY | PRIMARY | 4 | user_location_rating.location_id | 1 | Using where |
+----+-------------+---
Table sizes here are roughly 100 millions rows in
user_location_rating and 10 million rows in dim_location.Most users have < 1000 records in
user_location_rating and for those users query performance is adequate.For users with a lot of activity data, this query, even though it is two simple selects, can still be slow.
I would like to improve the query performance. Can this by done by adding an additional index?
As an alternative, is there a way to exploit an index to make a more limited query (such as below) more efficient than the full query?
SELECT d.create_time
FROM user_location_rating f
JOIN dim_location d using(location_id)
WHERE f.user_id=?
AND f.platform=?
AND d.category=?;
SELECT d.create_time
FROM user_location_rating f
JOIN dim_location d using(location_id)
WHERE f.user_id=?
AND f.platform=?
AND d.category=?
ORDER BY d.create_time DESC
LIMIT 1000;EXPLAIN SELECT yields the following on these queries (for a user with 999 events, say)
```
+----+-------------+----------------------+--------+-------------------------------+-----------+---------+----------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+--------+-------------------------------+-----------+---------+----------------------------------+------+-------------+
| 1 | SIMPLE | user_location_rating | ref | k_userloc,k_locplat,k_usrplat | k_usrplat | 8 | const,const | 999 | Using index |
| 1 | SIMPLE | dim_location | eq_ref | PRIMARY | PRIMARY | 4 | user_location_rating.location_id | 1 | Using where |
+----+-------------+---
Solution
For the first table, an index on
For the second table, it's more complex. You might get adequate performance with the existing index of the primary key, at least for the first query. A possible improvement is a
You could experiment with a
The definitions of integer columns look weird. Why do you have
That will save space in the table and their indexes, both in disk and in memory usage.
Less space -> Less I/O -> faster queries
(platform, user_id, location_id) or (user_id, platform, location_id) would be best, for both queries. Good for you, the existing index k_usrplat is equivalent to the second index (InnoDB indexes silently include the PK columns). You can see that the index is indeed used by both queries.For the second table, it's more complex. You might get adequate performance with the existing index of the primary key, at least for the first query. A possible improvement is a
(category, location_id, create_time) index. You could experiment with a
(category, create_time, location_id) index as well, which might be useful for the second query. A lot will depend on the distribution of data and efficiency may vary for different parameters. The definitions of integer columns look weird. Why do you have
int (16) and int (2)? If you think that that means any restrictions on the possible values in the columns, you are mistaken. The columns are of the same type, the numbers in parentheses is just a directive to user interfaces which is mostly ignored. If some of these columns, for example platform can only hold small values (say 0-100 or 0-2000), then use an appropriately smaller type:tiny int (-128 .. +127) : 1 byte
small int (-32768 .. + 32767) : 2 bytes
medium int (-2^23 .. + 2^23-1) : 3 bytes
tiny int unsigned (0 .. +255) : 1 byte
small int unsigned (0 .. +65535) : 2 bytes
medium int unsigned (0 .. +2^24-1) : 3 bytesThat will save space in the table and their indexes, both in disk and in memory usage.
Less space -> Less I/O -> faster queries
Code Snippets
tiny int (-128 .. +127) : 1 byte
small int (-32768 .. + 32767) : 2 bytes
medium int (-2^23 .. + 2^23-1) : 3 bytes
tiny int unsigned (0 .. +255) : 1 byte
small int unsigned (0 .. +65535) : 2 bytes
medium int unsigned (0 .. +2^24-1) : 3 bytesContext
StackExchange Database Administrators Q#198719, answer score: 5
Revisions (0)
No revisions yet.