HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

IMDB website query to find actors by time period

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
websiteperiodimdbquerytimefindactors

Problem

I'm using data from a database from the IMDB website. The database consists of five relevant tables.

  • Actor (id, fname, lname, gender)



  • Movie (id, name, year, rank)



  • Director (id, fname, lname)



  • Cast (pid, mid, role)



  • Movie_Director(did, mid)



It's worth noting that the id column in Actor, Movie & Director tables is a key for the respective table.

Cast.pid refers to Actor.id, and Cast.mid refers to Movie.id.

Here's the prompt and my initial attempt at solving it. Any help in improving/speeding up the query would be greatly appreciated.

/* List all the actors who acted in at least 
   one film in 2nd half of the 19th century and 
   in at least one film in the 1st half of the 20th century */  

    SELECT DISTINCT a.fname, a.lname
    FROM Actor a, Movie m1, Movie m2, Cast c1, Cast c2 
    WHERE c1.pid = a.id
    AND c1.mid = m1.id 
    AND m1.year BETWEEN 1850 AND 1900
    AND c2.pid = a.id
    AND c2.mid = m2.id
    AND m2.year BETWEEN 1901 AND 1950;

Solution

I see some of the same problems as in your last query

Do not use single-letter aliases

a, m1, c1... How about an alias that helps you write the query, instead of one that saves a few characters? That's what they are for, after all.

Old style JOIN

I think you would benefit from reading about explicit JOIN syntax instead of using the pre-ANSI-92 syntax.

Vertical white space

I personally find queries much easier to read if you use line breaks between lists of columns/values/conditions, instead of writing them inline.

Column aliases

It's good practice to rename short/ambiguous/ugly column names to something more human-friendly while presenting the result set. To you it may not matter much, but if you were presenting this report to your boss, they may scratch their head at fname and lname. The syntax for column aliases is the same as table aliases.

That Devil BETWEEN

BETWEEN is ambiguous. You should instead use logical operators.

This is about Microsoft T-SQL, but some problems apply pretty much across the board in SQL.

What do BETWEEN and the devil have in common?

I'll make no bones about it: BETWEEN is evil. For one, the meaning of the word in English does not always match the meaning of the operator in T-SQL. In T-SQL, BETWEEN is an inclusive range - not everyone gets that. Sure, in casual conversation when someone says "between 3 and 6" the answer really could be 3, 4, 5 or 6; but other times, they really mean to restrict the set to only 4 or 5 (an exclusive range).

The reviewed script

SELECT DISTINCT 
    Actor.fname AS ActorFirstName, 
    Actor.lname AS ActorLastName,
FROM 
    Actor -- look no alias needed
    INNER JOIN Cast AS OlderCast ON OlderCast.pid = Actor.id
    INNER JOIN Movie AS OlderMovie ON OlderCast.mid = OlderMovie.id
    INNER JOIN Cast AS NewerCast ON NewerCast.pid = Actor.id
    INNER JOIN Movie AS NewerMovie ON NewerCast.mid = NewerMovie.id
WHERE
    AND OlderMovie.year >= 1850 
    AND OlderMovie.year = 1901
    AND NewerMovie.year <= 1950;

Code Snippets

SELECT DISTINCT 
    Actor.fname AS ActorFirstName, 
    Actor.lname AS ActorLastName,
FROM 
    Actor -- look no alias needed
    INNER JOIN Cast AS OlderCast ON OlderCast.pid = Actor.id
    INNER JOIN Movie AS OlderMovie ON OlderCast.mid = OlderMovie.id
    INNER JOIN Cast AS NewerCast ON NewerCast.pid = Actor.id
    INNER JOIN Movie AS NewerMovie ON NewerCast.mid = NewerMovie.id
WHERE
    AND OlderMovie.year >= 1850 
    AND OlderMovie.year <= 1900
    AND NewerMovie.year >= 1901
    AND NewerMovie.year <= 1950;

Context

StackExchange Code Review Q#62080, answer score: 3

Revisions (0)

No revisions yet.