HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Optimizing a query with many joins

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
withqueryoptimizingmanyjoins

Problem

I have a query which selects a show from anime_series, then selects corresponding information from other tables (such as studio, which is stored as a SMALLINT reference to another table). The below query works as intended, but I really don't believe my code is as efficient as it can or should be.

The real complexity is that shows can and often do have multiple genres and studios. I'm current using GROUP_CONCAT on the multiple genres, studio names, and studio links, though I'm not sure this is the best method. I'm using a sub query for the next episode because I only need the first episode that hasn't aired, again, my method is probably over-complicating it.

SELECT
    anime.*,
    episode.number AS episode,
    episode.air_date AS ep_airdate,
    station.name AS station_name,
    station.link AS station_link,
    GROUP_CONCAT(DISTINCT genre.name ORDER BY LOWER(genre.name)) AS genres,
    GROUP_CONCAT(DISTINCT studio.name) AS studio_names,
    GROUP_CONCAT(DISTINCT studio.link) AS studio_links
FROM
    `anime_series` AS anime
LEFT JOIN (
    SELECT
        `air_date`,
        `series`,
        `number`
    FROM
        `anime_episodes`
    WHERE
        `air_date` > NOW()
    GROUP BY `series`) episode
    ON anime.id = episode.series
LEFT JOIN `anime_stations` station
    ON anime.station = station.id
LEFT JOIN `anime_genre_connections`
    ON anime_genre_connections.series_id = anime.id
    JOIN `anime_genres` AS genre
        ON anime_genre_connections.genre_id = genre.id
LEFT JOIN `anime_studio_connections`
    ON anime_studio_connections.series = anime.id
    JOIN `anime_studios` AS studio
        ON anime_studio_connections.studio = studio.id
WHERE anime.id = 1
GROUP BY anime.id;


Here's some table examples (anime_series missing irrelevant columns):

```
anime_series
id | station
1 | 1

anime_stations
id | name | link
1 | Something TV | http://example.com

anime_episodes
id | series | air_date | number
1 | 1 |

Solution

I would say the DISTINCT keywords in the following quote are superfluous. I would recommend to explicitely use INNER JOIN but even as it is, your JOIN clauses should implicitly compute that already. The only time I would use DISTINCT would be if I suspected that one anime had duplicate records of multiple of the same studio/genre.

GROUP_CONCAT(DISTINCT genre.name ORDER BY LOWER(genre.name)) AS genres,
GROUP_CONCAT(DISTINCT studio.name) AS studio_names,
GROUP_CONCAT(DISTINCT studio.link) AS studio_links


Your use of GROUP_CONCAT() should not cause much if any performance difference, as it computes that on the result set to display it differently. I'm curious about the reason you are using LEFT JOIN and joining a subquery, can you clarify?

Code Snippets

GROUP_CONCAT(DISTINCT genre.name ORDER BY LOWER(genre.name)) AS genres,
GROUP_CONCAT(DISTINCT studio.name) AS studio_names,
GROUP_CONCAT(DISTINCT studio.link) AS studio_links

Context

StackExchange Code Review Q#28478, answer score: 6

Revisions (0)

No revisions yet.