HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

How can I optimize this MySQL query further?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
thiscanqueryfurthermysqloptimizehow

Problem

I have a query that is taking a particularly long time to run (15+ seconds) and it is only getting worse with time as my dataset grows. I have optimized this in the past, and have added indices, code-level sorting and other optimizations, but it needs some further refining.

SELECT sounds.*, avg(ratings.rating) AS avg_rating, count(ratings.rating) AS votes FROM `sounds` 
INNER JOIN ratings ON sounds.id = ratings.rateable_id 
WHERE (ratings.rateable_type = 'Sound' 
   AND sounds.blacklisted = false 
   AND sounds.ready_for_deployment = true 
   AND sounds.deployed = true 
   AND sounds.type = "Sound" 
   AND sounds.created_at > "2011-03-26 21:25:49") 
GROUP BY ratings.rateable_id


The query's purpose is to get me the sound id's and the average rating of the most recent, released sounds. There are about 1500 sounds, and 2 Million ratings.

I have several indices on sounds

```
mysql> show index from sounds;
+--------+------------+------------------------------------------+--------------+----------------------+-----------+-------------+----------+--------+------+------------+————+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------+------------+------------------------------------------+--------------+----------------------+-----------+-------------+----------+--------+------+------------+————+
| sounds | 0 | PRIMARY | 1 | id | A | 1388 | NULL | NULL | | BTREE | |
| sounds | 1 | sounds_ready_for_deployment_and_deployed | 1 | deployed | A | 5 | NULL | NULL | YES | BTREE | |
| sounds | 1 | sounds_ready_for_deployment_and_deployed | 2 | ready_for_deployment | A | 12 | NULL | NULL | YES | BTREE |

Solution

After looking over the query, the tables, and the WHERE AND GROUP BY clauses, I recommend the following:

Recommendation #1) Refactor the Query

I reorganized the query to do three(3) things:

  • create smaller temp tables



  • Process the WHERE clause on those temp tables



  • Delay joining to the very last



Here is my proposed query:

SELECT
  sounds.*,srkeys.avg_rating,srkeys.votes
FROM
(
  SELECT AA.id,avg(BB.rating) AS avg_rating, count(BB.rating) AS votes
  (
    SELECT id FROM sounds
    WHERE blacklisted = false 
    AND   ready_for_deployment = true 
    AND   deployed = true 
    AND   type = "Sound" 
    AND   created_at > '2011-03-26 21:25:49'
  ) AA INNER JOIN
  (
    SELECT AAA.ratings,AAA.rateable_id
    FROM ratings AAA
    WHERE rateable_type = 'Sound'
  ) BB
  ON AA.id = BB.rateable_id
  GROUP BY BB.rateable_id
) srkeys INNER JOIN sounds USING (id);


Recommendation #2) Index the sounds table with an index that will accommodate the WHERE clause

The columns of this index include all the columns from the WHERE clause with static values first and moving target last

ALTER TABLE sounds ADD INDEX support_index
(blacklisted,ready_for_deployment,deployed,type,created_at);


I sincerely believe you will be pleasantly surprised. Give it a Try !!!

UPDATE 2011-05-21 19:04

I just saw the cardinality. OUCH !!! Cardinality of 1 for rateable_id. Boy, I feel stupid !!!

UPDATE 2011-05-21 19:20

Maybe making the index will be enough to improve things.

UPDATE 2011-05-21 22:56

Please run this:

EXPLAIN SELECT
  sounds.*,srkeys.avg_rating,srkeys.votes
FROM
(
  SELECT AA.id,avg(BB.rating) AS avg_rating, count(BB.rating) AS votes FROM
  (
    SELECT id FROM sounds
    WHERE blacklisted = false 
    AND   ready_for_deployment = true 
    AND   deployed = true 
    AND   type = "Sound" 
    AND   created_at > '2011-03-26 21:25:49'
  ) AA INNER JOIN
  (
    SELECT AAA.ratings,AAA.rateable_id
    FROM ratings AAA
    WHERE rateable_type = 'Sound'
  ) BB
  ON AA.id = BB.rateable_id
  GROUP BY BB.rateable_id
) srkeys INNER JOIN sounds USING (id);


UPDATE 2011-05-21 23:34

I refactored it again. Try This One Please:

EXPLAIN
  SELECT AA.id,avg(BB.rating) AS avg_rating, count(BB.rating) AS votes FROM
  (
    SELECT id FROM sounds
    WHERE blacklisted = false 
    AND   ready_for_deployment = true 
    AND   deployed = true 
    AND   type = "Sound" 
    AND   created_at > '2011-03-26 21:25:49'
  ) AA INNER JOIN
  (
    SELECT AAA.ratings,AAA.rateable_id
    FROM ratings AAA
    WHERE rateable_type = 'Sound'
  ) BB
  ON AA.id = BB.rateable_id
  GROUP BY BB.rateable_id
;


UPDATE 2011-05-21 23:55

I refactored it again. Try This One Please (Last Time):

EXPLAIN
  SELECT A.id,avg(B.rating) AS avg_rating, count(B.rating) AS votes FROM
  (
    SELECT BB.* FROM
    (
      SELECT id FROM sounds
      WHERE blacklisted = false 
      AND   ready_for_deployment = true 
      AND   deployed = true 
      AND   type = "Sound" 
      AND   created_at > '2011-03-26 21:25:49'
    ) AA INNER JOIN sounds BB USING (id)
  ) A INNER JOIN
  (
    SELECT AAA.ratings,AAA.rateable_id
    FROM ratings AAA
    WHERE rateable_type = 'Sound'
  ) B
  ON A.id = B.rateable_id
  GROUP BY B.rateable_id;


UPDATE 2011-05-22 00:12

I hate giving up !!!!

EXPLAIN
  SELECT A.*,avg(B.rating) AS avg_rating, count(B.rating) AS votes FROM
  (
    SELECT BB.* FROM
    (
      SELECT id FROM sounds
      WHERE blacklisted = false 
      AND   ready_for_deployment = true 
      AND   deployed = true 
      AND   type = "Sound" 
      AND   created_at > '2011-03-26 21:25:49'
    ) AA INNER JOIN sounds BB USING (id)
  ) A,
  (
    SELECT AAA.ratings,AAA.rateable_id
    FROM ratings AAA
    WHERE rateable_type = 'Sound'
    AND AAA.rateable_id = A.id
  ) B
  GROUP BY B.rateable_id;


UPDATE 2011-05-22 07:51

It has been bothering me that ratings is coming back with 2 million rows in the EXPLAIN. Then, it hit me. You might need another index on the ratings table which starts with rateable_type:

ALTER TABLE ratings ADD INDEX
rateable_type_rateable_id_ndx (rateable_type,rateable_id);


The goal of this index is to reduce the temp table that manipulates ratings so that it is less that 2 million. If we can get that temp table significantly smaller (at least half), then we can have a better hope in your query and mine working faster too.

After making that index, please Retry my original proposed query and also try yours:

```
SELECT
sounds.*,srkeys.avg_rating,srkeys.votes
FROM
(
SELECT AA.id,avg(BB.rating) AS avg_rating, count(BB.rating) AS votes
(
SELECT id FROM sounds
WHERE blacklisted = false
AND ready_for_deployment = true
AND deployed = true
AND type = "Sound"
AND created_at > '2011-03-26 21:25:49'
) AA INNER JOIN
(
SELECT AAA.ratings,AAA.rateable_id
FROM ratings AAA
WHERE rateable_type = 'Sound'
) BB
ON AA.id = BB.rateable_id
GROUP BY BB.rateable_i

Code Snippets

SELECT
  sounds.*,srkeys.avg_rating,srkeys.votes
FROM
(
  SELECT AA.id,avg(BB.rating) AS avg_rating, count(BB.rating) AS votes
  (
    SELECT id FROM sounds
    WHERE blacklisted = false 
    AND   ready_for_deployment = true 
    AND   deployed = true 
    AND   type = "Sound" 
    AND   created_at > '2011-03-26 21:25:49'
  ) AA INNER JOIN
  (
    SELECT AAA.ratings,AAA.rateable_id
    FROM ratings AAA
    WHERE rateable_type = 'Sound'
  ) BB
  ON AA.id = BB.rateable_id
  GROUP BY BB.rateable_id
) srkeys INNER JOIN sounds USING (id);
ALTER TABLE sounds ADD INDEX support_index
(blacklisted,ready_for_deployment,deployed,type,created_at);
EXPLAIN SELECT
  sounds.*,srkeys.avg_rating,srkeys.votes
FROM
(
  SELECT AA.id,avg(BB.rating) AS avg_rating, count(BB.rating) AS votes FROM
  (
    SELECT id FROM sounds
    WHERE blacklisted = false 
    AND   ready_for_deployment = true 
    AND   deployed = true 
    AND   type = "Sound" 
    AND   created_at > '2011-03-26 21:25:49'
  ) AA INNER JOIN
  (
    SELECT AAA.ratings,AAA.rateable_id
    FROM ratings AAA
    WHERE rateable_type = 'Sound'
  ) BB
  ON AA.id = BB.rateable_id
  GROUP BY BB.rateable_id
) srkeys INNER JOIN sounds USING (id);
EXPLAIN
  SELECT AA.id,avg(BB.rating) AS avg_rating, count(BB.rating) AS votes FROM
  (
    SELECT id FROM sounds
    WHERE blacklisted = false 
    AND   ready_for_deployment = true 
    AND   deployed = true 
    AND   type = "Sound" 
    AND   created_at > '2011-03-26 21:25:49'
  ) AA INNER JOIN
  (
    SELECT AAA.ratings,AAA.rateable_id
    FROM ratings AAA
    WHERE rateable_type = 'Sound'
  ) BB
  ON AA.id = BB.rateable_id
  GROUP BY BB.rateable_id
;
EXPLAIN
  SELECT A.id,avg(B.rating) AS avg_rating, count(B.rating) AS votes FROM
  (
    SELECT BB.* FROM
    (
      SELECT id FROM sounds
      WHERE blacklisted = false 
      AND   ready_for_deployment = true 
      AND   deployed = true 
      AND   type = "Sound" 
      AND   created_at > '2011-03-26 21:25:49'
    ) AA INNER JOIN sounds BB USING (id)
  ) A INNER JOIN
  (
    SELECT AAA.ratings,AAA.rateable_id
    FROM ratings AAA
    WHERE rateable_type = 'Sound'
  ) B
  ON A.id = B.rateable_id
  GROUP BY B.rateable_id;

Context

StackExchange Database Administrators Q#2863, answer score: 7

Revisions (0)

No revisions yet.