HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Extracting certain records

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
certainrecordsextracting

Problem

The purpose of this SQL query would be to extract the record 10192 only, because the end_time is both in the past and its foreign counterpart in the Node table still has the status 1. I don't want to select the record 10111 because it hasn't expired.

I've just finished writing this SQL query, it works! Although I'm sure it can be written better and I'm interested to here what you have to say.

SELECT MAX(l.id) AS highestID, l.nid, n.status
                        FROM mbl AS l
                        INNER JOIN node AS n
                        ON l.nid = n.nid
                        WHERE n.status = 1
                        AND (   SELECT end_time 
                                FROM mbl
                                WHERE id = (    SELECT MAX(id) 
                                                FROM mbl
                                                WHERE nid = l.nid
                                                GROUP BY nid
                                            )
                            ) <= NOW()
                        GROUP BY l.nid


Mlb table data:

id  | nid | start_time | end_time

105 | 5 | 20/07/2014 01:00:00 | 24/07/2014 01:00:00

589 | 52 | 20/07/2014 01:00:00 | 31/07/2015 01:32:00

10108 | 874 | 20/07/2014 01:00:00 | 24/07/2014 01:00:00

10111 | 874 | 20/07/2014 01:00:00 | 24/07/2015 01:00:00 

10156 | 967 | 20/07/2012 01:00:00 | 24/07/2013 01:00:00

10161 | 967 | 20/07/2013 01:00:00 | 20/07/2014 01:00:00

10192 | 967 | 20/07/2014 01:00:00 | 24/07/2014 01:00:00


Node table data:

nid  | status

5 | 0

52 | 1

874 | 1

967 | 1

Solution

First, let's go over some problems in your original query:

SELECT MAX(l.id) AS highestID, l.nid, n.status
..............
GROUP BY l.nid


... if there's more than one n.status for a given l.nid, you get a "random" row, decided by the optimizer (and not controllable by you). I'm presuming that you only have one row here (we'll exploit this in a bit).

(SELECT end_time 
 FROM mbl
 WHERE id = (SELECT MAX(id) 
             FROM mbl
             WHERE nid = l.nid
             GROUP BY nid)) <= NOW()


I'm presuming that id is still some sort of autogenerated value. The problem is, outside of some very specific contexts, the only thing a db-generated id should be used for is its uniqueness. Especially in situations where rows may be updated, max/min/ordering by ids rapidly becomes irrelevant. You seem to be attempting to make sure the most-recently-added row has an end_time value in the past - if you have some sort of inserted_at or similar timestamp, that would be a better column to use. It would also be helpful to know why you're looking for the greatest id in this case. (However, as I don't know the other columns in you table, I'm stuck with it for the moment).

It turns out that your statement can be reduced to this:

SELECT Most_Recent.highestId, Most_Recent.nid, Node.status
FROM (SELECT nid, MAX(id) AS highestId
      FROM Mbl
      GROUP BY nid) Most_Recent
JOIN Node
  ON Node.nid = Most_Recent.nid
JOIN Mbl
  ON Mbl.nid = Most_Recent.nid
     AND Mbl.id = Most_Recent.highestId
     AND Mbl.end_time <= NOW()


.... which not only eliminates a table reference, but your original version had issues with potential RBAR (row-by-agonizing-row)/Cartesian product plans. Note that we're still required to join to Mbl twice, with the second time being solely to make sure that the most-recent row also has an end_time value in the past (this can't be done in the subquery, because that would collect the most-recent-id of rows with end_time in the past)

Code Snippets

SELECT MAX(l.id) AS highestID, l.nid, n.status
..............
GROUP BY l.nid
(SELECT end_time 
 FROM mbl
 WHERE id = (SELECT MAX(id) 
             FROM mbl
             WHERE nid = l.nid
             GROUP BY nid)) <= NOW()
SELECT Most_Recent.highestId, Most_Recent.nid, Node.status
FROM (SELECT nid, MAX(id) AS highestId
      FROM Mbl
      GROUP BY nid) Most_Recent
JOIN Node
  ON Node.nid = Most_Recent.nid
JOIN Mbl
  ON Mbl.nid = Most_Recent.nid
     AND Mbl.id = Most_Recent.highestId
     AND Mbl.end_time <= NOW()

Context

StackExchange Code Review Q#58643, answer score: 6

Revisions (0)

No revisions yet.