HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

How can I order the results from a DISTINCT ON query

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
distinctcanorderthequeryhowresultsfrom

Problem

I have a table with the equivalent of:

group_id integer REFERENCES 
item_id integer REFERECNES 
timestamp date
data text
sort_order text


I want to get the latest data for every item before some date. I.e. the maximum timestamp less than some value. I also want these rows sorted on sort_order.

After som Googling I find that SELECT DISTINCT ON is what I want, so I get something like

SELECT DISTINCT ON (item_id)
         data
    FROM [this table]
   WHERE group_id = [some value]
     AND timestamp < [some value]
ORDER BY item_id, timestamp DESC


Now, I actually want the data sorted on sort_order, but DISTINCT ON requires the ORDER BY to be listed first.

One solution that comes to mind is to wrap all of this in an outer SELECT that will ORDER BY sort_order. Is that a good efficient solution, or is there a better "correct" way to do this.

(Or, am I in fact not even close...)

Solution

Wrapping the query in a derived table is the obvious way to solve this:

SELECT *
FROM
    ( SELECT DISTINCT ON (item_id)
             item_id, data, timestamp, sort_order, ... 
        FROM [this table]
       WHERE group_id = [some value]
         AND timestamp < [some value]
    ORDER BY item_id, timestamp DESC
   ) AS t
ORDER BY sort_order ;


If your original query is efficient and doesn't return too many rows, the additional sort will not add much cost to the query.

Besides DISTINCT ON, there are a few other methods that are often more efficient in this type of "greatest-n-pre-group" queries - but efficiency depends on many parameters, like targeted indexes and distribution of values (how many distinct item_id in the table, how many rows per item_id), Postgres version, etc.

Most notable method is using LATERAL subqueries. See these excellent answers by Erwin Brandstetter:

-
Optimize GROUP BY query to retrieve latest record per user

-
Optimize groupwise maximum query

-
Select first row in each GROUP BY group?

-
Query last N related rows per row

-
Optimise a LATERAL JOIN query on a big table

Code Snippets

SELECT *
FROM
    ( SELECT DISTINCT ON (item_id)
             item_id, data, timestamp, sort_order, ... 
        FROM [this table]
       WHERE group_id = [some value]
         AND timestamp < [some value]
    ORDER BY item_id, timestamp DESC
   ) AS t
ORDER BY sort_order ;

Context

StackExchange Database Administrators Q#207894, answer score: 3

Revisions (0)

No revisions yet.