snippetsqlMinor
How can I order the results from a DISTINCT ON query
Viewed 0 times
distinctcanorderthequeryhowresultsfrom
Problem
I have a table with the equivalent of:
I want to get the latest
After som Googling I find that
Now, I actually want the data sorted on
One solution that comes to mind is to wrap all of this in an outer
(Or, am I in fact not even close...)
group_id integer REFERENCES
item_id integer REFERECNES
timestamp date
data text
sort_order textI want to get the latest
data for every item before some date. I.e. the maximum timestamp less than some value. I also want these rows sorted on sort_order.After som Googling I find that
SELECT DISTINCT ON is what I want, so I get something likeSELECT DISTINCT ON (item_id)
data
FROM [this table]
WHERE group_id = [some value]
AND timestamp < [some value]
ORDER BY item_id, timestamp DESCNow, I actually want the data sorted on
sort_order, but DISTINCT ON requires the ORDER BY to be listed first.One solution that comes to mind is to wrap all of this in an outer
SELECT that will ORDER BY sort_order. Is that a good efficient solution, or is there a better "correct" way to do this.(Or, am I in fact not even close...)
Solution
Wrapping the query in a derived table is the obvious way to solve this:
If your original query is efficient and doesn't return too many rows, the additional sort will not add much cost to the query.
Besides
Most notable method is using
-
Optimize
-
Optimize groupwise maximum query
-
Select first row in each
-
Query last N related rows per row
-
Optimise a
SELECT *
FROM
( SELECT DISTINCT ON (item_id)
item_id, data, timestamp, sort_order, ...
FROM [this table]
WHERE group_id = [some value]
AND timestamp < [some value]
ORDER BY item_id, timestamp DESC
) AS t
ORDER BY sort_order ;If your original query is efficient and doesn't return too many rows, the additional sort will not add much cost to the query.
Besides
DISTINCT ON, there are a few other methods that are often more efficient in this type of "greatest-n-pre-group" queries - but efficiency depends on many parameters, like targeted indexes and distribution of values (how many distinct item_id in the table, how many rows per item_id), Postgres version, etc. Most notable method is using
LATERAL subqueries. See these excellent answers by Erwin Brandstetter:-
Optimize
GROUP BY query to retrieve latest record per user-
Optimize groupwise maximum query
-
Select first row in each
GROUP BY group?-
Query last N related rows per row
-
Optimise a
LATERAL JOIN query on a big tableCode Snippets
SELECT *
FROM
( SELECT DISTINCT ON (item_id)
item_id, data, timestamp, sort_order, ...
FROM [this table]
WHERE group_id = [some value]
AND timestamp < [some value]
ORDER BY item_id, timestamp DESC
) AS t
ORDER BY sort_order ;Context
StackExchange Database Administrators Q#207894, answer score: 3
Revisions (0)
No revisions yet.