patternsqlMinor
IMDB website query to find actors by time period
Viewed 0 times
websiteperiodimdbquerytimefindactors
Problem
I'm using data from a database from the IMDB website. The database consists of five relevant tables.
It's worth noting that the
Here's the prompt and my initial attempt at solving it. Any help in improving/speeding up the query would be greatly appreciated.
- Actor (id, fname, lname, gender)
- Movie (id, name, year, rank)
- Director (id, fname, lname)
- Cast (pid, mid, role)
- Movie_Director(did, mid)
It's worth noting that the
id column in Actor, Movie & Director tables is a key for the respective table.Cast.pid refers to Actor.id, and Cast.mid refers to Movie.id.Here's the prompt and my initial attempt at solving it. Any help in improving/speeding up the query would be greatly appreciated.
/* List all the actors who acted in at least
one film in 2nd half of the 19th century and
in at least one film in the 1st half of the 20th century */
SELECT DISTINCT a.fname, a.lname
FROM Actor a, Movie m1, Movie m2, Cast c1, Cast c2
WHERE c1.pid = a.id
AND c1.mid = m1.id
AND m1.year BETWEEN 1850 AND 1900
AND c2.pid = a.id
AND c2.mid = m2.id
AND m2.year BETWEEN 1901 AND 1950;Solution
I see some of the same problems as in your last query
Do not use single-letter aliases
Old style
I think you would benefit from reading about explicit
Vertical white space
I personally find queries much easier to read if you use line breaks between lists of columns/values/conditions, instead of writing them inline.
Column aliases
It's good practice to rename short/ambiguous/ugly column names to something more human-friendly while presenting the result set. To you it may not matter much, but if you were presenting this report to your boss, they may scratch their head at
That Devil
This is about Microsoft T-SQL, but some problems apply pretty much across the board in SQL.
What do BETWEEN and the devil have in common?
I'll make no bones about it: BETWEEN is evil. For one, the meaning of the word in English does not always match the meaning of the operator in T-SQL. In T-SQL, BETWEEN is an inclusive range - not everyone gets that. Sure, in casual conversation when someone says "between 3 and 6" the answer really could be 3, 4, 5 or 6; but other times, they really mean to restrict the set to only 4 or 5 (an exclusive range).
The reviewed script
Do not use single-letter aliases
a, m1, c1... How about an alias that helps you write the query, instead of one that saves a few characters? That's what they are for, after all.Old style
JOINI think you would benefit from reading about explicit
JOIN syntax instead of using the pre-ANSI-92 syntax.Vertical white space
I personally find queries much easier to read if you use line breaks between lists of columns/values/conditions, instead of writing them inline.
Column aliases
It's good practice to rename short/ambiguous/ugly column names to something more human-friendly while presenting the result set. To you it may not matter much, but if you were presenting this report to your boss, they may scratch their head at
fname and lname. The syntax for column aliases is the same as table aliases.That Devil
BETWEENBETWEEN is ambiguous. You should instead use logical operators.This is about Microsoft T-SQL, but some problems apply pretty much across the board in SQL.
What do BETWEEN and the devil have in common?
I'll make no bones about it: BETWEEN is evil. For one, the meaning of the word in English does not always match the meaning of the operator in T-SQL. In T-SQL, BETWEEN is an inclusive range - not everyone gets that. Sure, in casual conversation when someone says "between 3 and 6" the answer really could be 3, 4, 5 or 6; but other times, they really mean to restrict the set to only 4 or 5 (an exclusive range).
The reviewed script
SELECT DISTINCT
Actor.fname AS ActorFirstName,
Actor.lname AS ActorLastName,
FROM
Actor -- look no alias needed
INNER JOIN Cast AS OlderCast ON OlderCast.pid = Actor.id
INNER JOIN Movie AS OlderMovie ON OlderCast.mid = OlderMovie.id
INNER JOIN Cast AS NewerCast ON NewerCast.pid = Actor.id
INNER JOIN Movie AS NewerMovie ON NewerCast.mid = NewerMovie.id
WHERE
AND OlderMovie.year >= 1850
AND OlderMovie.year = 1901
AND NewerMovie.year <= 1950;Code Snippets
SELECT DISTINCT
Actor.fname AS ActorFirstName,
Actor.lname AS ActorLastName,
FROM
Actor -- look no alias needed
INNER JOIN Cast AS OlderCast ON OlderCast.pid = Actor.id
INNER JOIN Movie AS OlderMovie ON OlderCast.mid = OlderMovie.id
INNER JOIN Cast AS NewerCast ON NewerCast.pid = Actor.id
INNER JOIN Movie AS NewerMovie ON NewerCast.mid = NewerMovie.id
WHERE
AND OlderMovie.year >= 1850
AND OlderMovie.year <= 1900
AND NewerMovie.year >= 1901
AND NewerMovie.year <= 1950;Context
StackExchange Code Review Q#62080, answer score: 3
Revisions (0)
No revisions yet.