HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Self Join Exercise. Have I over-complicated it?

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
exercisejoinoverhavecomplicatedself

Problem

This my 3rd question on the same exercise, but by no means a duplicate. The two previous questions were posted on StackOverflow here and here.

Now I'm posting my Oracle solution (below) that works. I wonder if the same could've been simpler and/or much more efficient. Not in terms of CTE or analytic expressions but with basic simple logic.

Data:

CREATE TABLE Readings (
  user_id varchar(10),
  reading_time int ,
  x decimal(10,2),
  y decimal(10,2)
);

INSERT ALL
    INTO Readings (user_id, reading_time, x, y) VALUES ('u1', 60, 345, 400)
    INTO Readings (user_id, reading_time, x, y) VALUES ('u1', 100, 560, 300)
    INTO Readings (user_id, reading_time, x, y) VALUES ('u2', 35, 1024, 250)
    INTO Readings (user_id, reading_time, x, y) VALUES ('u1', 90, 450, 450)
    INTO Readings (user_id, reading_time, x, y) VALUES ('u3', 150, 600, 100)
    INTO Readings (user_id, reading_time, x, y) VALUES ('u3', 100, 500, 125)
SELECT * FROM dual;


Intermediate script:

SELECT r.user_id, rm.reading_time start_time, r.reading_time end_time, 
       (r.reading_time-rm.reading_time) time_spent,
       (TO_CHAR(rm.x)||' ; '||TO_CHAR(rm.y)) start_point, 
       (TO_CHAR(r.x)||' ; '||TO_CHAR(r.y)) end_point,
       SQRT(POWER(r.x-rm.x, 2)+POWER(r.x-rm.y, 2)) distance            
FROM Readings r 
JOIN Readings rm ON (r.user_id = rm.user_id and 
                     rm.reading_time = (SELECT MAX(r2.reading_time) FROM Readings r2 WHERE r2.reading_time < r.reading_time))
ORDER BY 1,2;


Final script:

```
SELECT rr.user_id, SUM(rr.distance) "Total Distance",
SUM(rr.time_spent) "Total Time", SUM(rr.distance)/SUM(rr.time_spent) "Average Speed"
FROM
(SELECT r.user_id, (r.reading_time-rm.reading_time) time_spent,
SQRT(POWER(r.x-rm.x, 2)+POWER(r.x-rm.y, 2)) distance
FROM Readings r
JOIN Readings rm ON (r.user_id = rm.user_id and
rm.reading_time = (SELECT MAX(r2.reading_time) FROM Readings r2 WHERE r2.reading_time < r.readin

Solution

-
You have a bug. Change the row ('u3', 100, 500, 125) to ('u3', 99, 500, 125), and try again. You will see that u3 row disappears.

You can fix it with the following minimal change:

rm.reading_time = (SELECT MAX(r2.reading_time) 
  FROM Readings r2 
  WHERE  r2.user_id=r.user_id -- you have to join on ids first
  and r2.reading_time < r.reading_time)


-
There is another bug here:
r.x-rm.y should have been r.y-rm.y.

-
user_id, reading_time is a unique key by problem definition. Having it declared will enable us to detect some invalid inputs. What is the average speed of an object that can be at more than one place?

-
I would go along with something like this for the intermediate script:

SELECT t.*, t2.*
FROM (SELECT r.user_id,r.reading_time, 
         max(r.x) AS x1, max(r.y) AS y1, 
         max(rm.reading_time) AS prev
FROM Readings r
JOIN Readings rm ON (r.user_id = rm.user_id AND rm.reading_time < r.reading_time)
GROUP BY r.user_id,r.reading_time) t
INNER JOIN Readings t2
ON t.user_id = t2.user_id AND t2.reading_time = t.prev;

Code Snippets

rm.reading_time = (SELECT MAX(r2.reading_time) 
  FROM Readings r2 
  WHERE  r2.user_id=r.user_id -- you have to join on ids first
  and r2.reading_time < r.reading_time)
SELECT t.*, t2.*
FROM (SELECT r.user_id,r.reading_time, 
         max(r.x) AS x1, max(r.y) AS y1, 
         max(rm.reading_time) AS prev
FROM Readings r
JOIN Readings rm ON (r.user_id = rm.user_id AND rm.reading_time < r.reading_time)
GROUP BY r.user_id,r.reading_time) t
INNER JOIN Readings t2
ON t.user_id = t2.user_id AND t2.reading_time = t.prev;

Context

StackExchange Code Review Q#23341, answer score: 2

Revisions (0)

No revisions yet.