HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Simplify SQL query that utilizes GROUP BY and HAVING BY and uses aggregation

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
utilizesgroupsqlqueryhavingusessimplifythatandaggregation

Problem

I am taking Stanford's Introduction to Databases Self-Paced online course. I have gone through the videos in the SQL mini-course, and I am having trouble completing the exercises.

The following is the question from the SQL Movie-Rating Query Exercises, Question 6:


For all cases where the same reviewer rated the same movie twice and
gave it a higher rating the second time, return the reviewer's name
and the title of the movie.

The database can be found here.

My answer to this question is very complex; so much so that I the length of my query exceeds the query character limit in class. I will post my solution in steps, since I think that might help surface some of my conceptual misunderstandings.

Step 1:Give me all the movies where the same reviewer reviewed the same movie twice

SELECT Rating.rID, Rating.mID 
FROM Rating
GROUP BY Rating.rID, Rating.mID
HAVING count(*) > 1;


Step 2: Let's join the three tables so that all the information is in one place:

SELECT Movie.mID, Movie.title, Movie.year, Movie.director, Rating.rID,
       Rating.stars, Rating.ratingDate, Reviewer.name 
FROM Movie, Rating, Reviewer
WHERE Movie.mID = Rating.mID and Rating.rID = Reviewer.rID;


Step 3: Filter the result set in Step 2 so that it only contains tuples that are also in Step 1

SELECT *
FROM 
(SELECT Movie.mID, Movie.title, Movie.year, Movie.director, Rating.rID,
       Rating.stars, Rating.ratingDate, Reviewer.name 
FROM Movie, Rating, Reviewer
WHERE Movie.mID = Rating.mID and Rating.rID = Reviewer.rID) compiled
WHERE compiled.mID in 
(SELECT Rating.mID 
FROM Rating
GROUP BY Rating.rID, Rating.mID
HAVING count(*) > 1) and compiled.rID in (SELECT Rating.rID 
FROM Rating
GROUP BY Rating.rID, Rating.mID
HAVING count(*) > 1);


Step 4: Return the result set from Step 3 twice so that we can compare them the rating, reviewer, movie, stars and date

```
SELECT view_one.name, view_one.title
FROM (SELECT *
FROM
(SELECT Movie.mID, Movie.title, Movie.yea

Solution

SELECT Rating.rID, Rating.mID 
FROM Rating
GROUP BY Rating.rID, Rating.mID
HAVING count(*) > 1;


This returns all ratings where the same person reviewed the same movie more than once, but that's not what we want. We want all the ratings where the same person reviewed the same movie twice, giving it a higher rating the second time.

SELECT DISTINCT S2.rID, S2.mID
FROM Rating S1 INNER JOIN Rating S2 ON S1.rID = S2.rID AND S1.mID = S2.mID
WHERE S1.ratingDate > S2.ratingDate AND S1.stars > S2.stars;


Now let's look at your step 2:

SELECT Movie.mID, Movie.title, Movie.year, Movie.director, Rating.rID,
       Rating.stars, Rating.ratingDate, Reviewer.name 
FROM Movie, Rating, Reviewer
WHERE Movie.mID = Rating.mID and Rating.rID = Reviewer.rID;


Why do we need so many columns? The assignment is to return the movie title and reviewer name, so let's just do that:

SELECT Movie.title, Reviewer.name 
FROM Movie
  INNER JOIN Rating S ON Movie.mID = S.mID
  INNER JOIN Reviewer ON S.rID = Reviewer.rID;


So we can finish with a new step 3:

SELECT Movie.title, Reviewer.name 
FROM Movie
  INNER JOIN (
    SELECT DISTINCT S2.rID, S2.mID
    FROM Rating S1 INNER JOIN Rating S2 ON S1.rID = S2.rID AND S1.mID = S2.mID
    WHERE S1.ratingDate > S2.ratingDate AND S1.stars > S2.stars
             ) S ON Movie.mID = S.mID
  INNER JOIN Reviewer ON S.rID = Reviewer.rID;


This uses a single subselect to replace Rating from step 2 with a new version that only includes those Reviewer/Movie pairs where the second rating was higher than the first.

Code Snippets

SELECT Rating.rID, Rating.mID 
FROM Rating
GROUP BY Rating.rID, Rating.mID
HAVING count(*) > 1;
SELECT DISTINCT S2.rID, S2.mID
FROM Rating S1 INNER JOIN Rating S2 ON S1.rID = S2.rID AND S1.mID = S2.mID
WHERE S1.ratingDate > S2.ratingDate AND S1.stars > S2.stars;
SELECT Movie.mID, Movie.title, Movie.year, Movie.director, Rating.rID,
       Rating.stars, Rating.ratingDate, Reviewer.name 
FROM Movie, Rating, Reviewer
WHERE Movie.mID = Rating.mID and Rating.rID = Reviewer.rID;
SELECT Movie.title, Reviewer.name 
FROM Movie
  INNER JOIN Rating S ON Movie.mID = S.mID
  INNER JOIN Reviewer ON S.rID = Reviewer.rID;
SELECT Movie.title, Reviewer.name 
FROM Movie
  INNER JOIN (
    SELECT DISTINCT S2.rID, S2.mID
    FROM Rating S1 INNER JOIN Rating S2 ON S1.rID = S2.rID AND S1.mID = S2.mID
    WHERE S1.ratingDate > S2.ratingDate AND S1.stars > S2.stars
             ) S ON Movie.mID = S.mID
  INNER JOIN Reviewer ON S.rID = Reviewer.rID;

Context

StackExchange Code Review Q#79916, answer score: 2

Revisions (0)

No revisions yet.