patternsqlMinor
MySQL IMDB Query Speed
Viewed 0 times
mysqlqueryspeedimdb
Problem
I'm using data from a database from the IMDB website. The database consists of five relevant tables.
It's worth noting that the
Here's the prompt and my initial attempt at solving it. Any help in improving/speeding up the query would be greatly appreciated.
- Actor (id, fname, lname, gender)
- Movie (id, name, year, rank)
- Director (id, fname, lname)
- Cast (pid, mid, role)
- Movie_Director (did, mid)
It's worth noting that the
id column in Actor, Movie & Director tables is a key for the respective table. Cast.pid refers to Actor.id, and Cast.mid refers to Movie.id. Here's the prompt and my initial attempt at solving it. Any help in improving/speeding up the query would be greatly appreciated.
/* For every pair of male and female actors that appear together in some film,
find the total number of films in which they appear together.
Sort the answers in decreasing order of the total number of films. */
SELECT a1.fname, a1.lname, a2.fname, a2.lname, count(*) AS num_films
FROM Actor AS a1, Actor AS a2, Cast AS c1, Cast AS c2
WHERE c1.mid = c2.mid AND
a1.id = c1.pid AND
a2.id = c2.pid AND
a1.gender = 'M' AND
a2.gender = 'F'
GROUP BY a1.id, a2.id, a1.fname, a1.lname, a2.fname, a2.lname
HAVING COUNT(*) > 0
ORDER BY COUNT(*) DESC;Solution
Alias notation
Your table aliases
In fact, given the nature of your query, you could give them even more useful names... how about
Old style
This
Instead:
Multiple counting
I see you
This:
Instead:
That should speed up your query a bit too.
Everything together
Your table aliases
a1/a2 and c1/c2 are not good. In this very short query it may not matter, however names for aliases, variables, parameters, etc. should be descriptive enough to say something about what they represent. Besides, with 4-5 character table names, is it really that much to type? I would use Actor1/Actor2 and Cast1/Cast2.In fact, given the nature of your query, you could give them even more useful names... how about
MaleAct/MaleCast and FemAct/FemCast!Old style
JOINThis
JOIN syntax should not be used:FROM Actor AS a1, Actor AS a2, Cast AS c1, Cast AS c2
WHERE c1.mid = c2.mid AND
a1.id = c1.pid AND
a2.id = c2.pid ANDInstead:
FROM Actor AS MaleAct
INNER JOIN Cast AS MaleCast ON MaleAct.id = MaleCast.pid
INNER JOIN Cast AS FemCast ON MaleCast.id = FemCast.id
INNER JOIN Actor AS FemAct ON FemAct.id = FemCast.pidMultiple counting
I see you
COUNT() 3 times within your query. Since you're already selecting the COUNT() into your result set, and MySQL will allow you to use column aliases in GROUP BY, ORDER BY, or HAVING clauses.This:
HAVING COUNT(*) > 0
ORDER BY COUNT(*) DESC;Instead:
HAVING num_films > 0
ORDER BY num_films DESC;That should speed up your query a bit too.
Everything together
SELECT MaleAct.fname, MaleAct.lname, FemAct.fname, FemAct.lname, count(*) AS num_films
FROM Actor AS MaleAct
INNER JOIN Cast AS MaleCast ON MaleAct.id = MaleCast.pid
INNER JOIN Cast AS FemCast ON MaleCast.id = FemCast.id
INNER JOIN Actor AS FemAct ON FemAct.id = FemCast.pid
WHERE
MaleAct.gender = 'M' AND
FemAct.gender = 'F'
GROUP BY MaleAct.id, FemAct.id, MaleAct.fname, MaleAct.lname, FemAct.fname, FemAct.lname,
HAVING num_films > 0
ORDER BY num_films DESC;Code Snippets
FROM Actor AS a1, Actor AS a2, Cast AS c1, Cast AS c2
WHERE c1.mid = c2.mid AND
a1.id = c1.pid AND
a2.id = c2.pid ANDFROM Actor AS MaleAct
INNER JOIN Cast AS MaleCast ON MaleAct.id = MaleCast.pid
INNER JOIN Cast AS FemCast ON MaleCast.id = FemCast.id
INNER JOIN Actor AS FemAct ON FemAct.id = FemCast.pidHAVING COUNT(*) > 0
ORDER BY COUNT(*) DESC;HAVING num_films > 0
ORDER BY num_films DESC;SELECT MaleAct.fname, MaleAct.lname, FemAct.fname, FemAct.lname, count(*) AS num_films
FROM Actor AS MaleAct
INNER JOIN Cast AS MaleCast ON MaleAct.id = MaleCast.pid
INNER JOIN Cast AS FemCast ON MaleCast.id = FemCast.id
INNER JOIN Actor AS FemAct ON FemAct.id = FemCast.pid
WHERE
MaleAct.gender = 'M' AND
FemAct.gender = 'F'
GROUP BY MaleAct.id, FemAct.id, MaleAct.fname, MaleAct.lname, FemAct.fname, FemAct.lname,
HAVING num_films > 0
ORDER BY num_films DESC;Context
StackExchange Code Review Q#62056, answer score: 4
Revisions (0)
No revisions yet.