snippetsqlMinor
How to properly implement compound greatest-n filtering
Viewed 0 times
compoundimplementgreatestproperlyhowfiltering
Problem
Yep, more greatest-n-per-group questions.
Given the a table
I want to select the compound max of volume, then chapter for a set of series.
Right now, if I query per-distinct-series, I can easily accomplish this as follows:
However, if I have a large set of
I'd like to roll the whole thing into a single query, where I can simply say
The naive approach would be:
```
SELECT releases.volume AS releases_volume,
releases.chapter AS releases_chapter,
releases.series AS releases_series
FROM
releases
WHERE
releases.series IN (12, 17, 44, 79, 88, 110, 129, 133, 142, 160, 193, 231, 235, 295, 340, 484, 499,
556, 581, 664, 666, 701, 741, 780, 790, 796, 874, 930, 1066, 1091, 1135, 1137,
1172, 1331, 1374, 1418, 1435, 1447, 1471, 1505, 1521, 1540, 1616, 1702, 1768,
1825, 1828, 1847, 1881, 2007, 2020, 2051, 2085, 2158, 2183, 2190, 2235, 2255,
2264, 2275, 2325, 2333, 2334, 2337, 2341, 2343, 2348, 2370, 2372, 2376, 2606,
2634, 2636, 2695, 2696 )
AND releases.include = TRUE
GROUP BY
releases_series
ORDER BY releases.volume DESC N
Given the a table
releases with the following columns:id | primary key |
volume | double precision |
chapter | double precision |
series | integer-foreign-key |
include | boolean | not nullI want to select the compound max of volume, then chapter for a set of series.
Right now, if I query per-distinct-series, I can easily accomplish this as follows:
SELECT
releases.chapter AS releases_chapter,
releases.include AS releases_include,
releases.series AS releases_series
FROM releases
WHERE releases.series = 741
AND releases.include = TRUE
ORDER BY releases.volume DESC NULLS LAST, releases.chapter DESC NULLS LAST LIMIT 1;However, if I have a large set of
series (and I do), this quickly runs into efficiency issues where I'm issuing 100+ queries to generate a single page.I'd like to roll the whole thing into a single query, where I can simply say
WHERE releases.series IN (1,2,3....), but I haven't figured out how to convince Postgres to let me do that.The naive approach would be:
```
SELECT releases.volume AS releases_volume,
releases.chapter AS releases_chapter,
releases.series AS releases_series
FROM
releases
WHERE
releases.series IN (12, 17, 44, 79, 88, 110, 129, 133, 142, 160, 193, 231, 235, 295, 340, 484, 499,
556, 581, 664, 666, 701, 741, 780, 790, 796, 874, 930, 1066, 1091, 1135, 1137,
1172, 1331, 1374, 1418, 1435, 1447, 1471, 1505, 1521, 1540, 1616, 1702, 1768,
1825, 1828, 1847, 1881, 2007, 2020, 2051, 2085, 2158, 2183, 2190, 2235, 2255,
2264, 2275, 2325, 2333, 2334, 2337, 2341, 2343, 2348, 2370, 2372, 2376, 2606,
2634, 2636, 2695, 2696 )
AND releases.include = TRUE
GROUP BY
releases_series
ORDER BY releases.volume DESC N
Solution
The simple solution in Postgres is with
Details:
Depending on data distribution there may be faster techniques:
Also, there are faster alternatives for long lists than
Combining an unnested array with a
Is often faster. For best performance you need a matching multicolumn index like:
Related:
And if there are more than a few rows where
DISTINCT ON:SELECT DISTINCT ON (r.series)
r.volume AS releases_volume
, r.chapter AS releases_chapter
, r.series AS releases_series
FROM releases r
WHERE r.series IN (
12, 17, 44, 79, 88, 110, 129, 133, 142, 160, 193, 231, 235, 295, 340, 484, 499
, 556, 581, 664, 666, 701, 741, 780, 790, 796, 874, 930, 1066, 1091, 1135, 1137
, 1172, 1331, 1374, 1418, 1435, 1447, 1471, 1505, 1521, 1540, 1616, 1702, 1768
, 1825, 1828, 1847, 1881, 2007, 2020, 2051, 2085, 2158, 2183, 2190, 2235, 2255
, 2264, 2275, 2325, 2333, 2334, 2337, 2341, 2343, 2348, 2370, 2372, 2376, 2606
, 2634, 2636, 2695, 2696)
AND r.include
ORDER BY r.series, r.volume DESC NULLS LAST, r.chapter DESC NULLS LAST;Details:
- Select first row in each GROUP BY group?
Depending on data distribution there may be faster techniques:
- Optimize GROUP BY query to retrieve latest record per user
Also, there are faster alternatives for long lists than
IN ().Combining an unnested array with a
LATERAL join:SELECT r.*
FROM unnest('{12, 17, 44, 79, 88, 110, 129}'::int[]) t(i) -- or many more items
, LATERAL (
SELECT volume AS releases_volume
, chapter AS releases_chapter
, series AS releases_series
FROM releases
WHERE series = t.i
AND include
ORDER BY series, volume DESC NULLS LAST, chapter DESC NULLS LAST
LIMIT 1
) r;Is often faster. For best performance you need a matching multicolumn index like:
CREATE INDEX releases_series_volume_chapter_idx
ON releases(series, volume DESC NULLS LAST, chapter DESC NULLS LAST);Related:
- Extremely slow query on indexed column
And if there are more than a few rows where
include is not true, while you are only interested in the rows with include = true, then consider a partial multicolumn index:CREATE INDEX releases_series_volume_chapter_idx
ON releases(series, volume DESC NULLS LAST, chapter DESC NULLS LAST)
WHERE include;Code Snippets
SELECT DISTINCT ON (r.series)
r.volume AS releases_volume
, r.chapter AS releases_chapter
, r.series AS releases_series
FROM releases r
WHERE r.series IN (
12, 17, 44, 79, 88, 110, 129, 133, 142, 160, 193, 231, 235, 295, 340, 484, 499
, 556, 581, 664, 666, 701, 741, 780, 790, 796, 874, 930, 1066, 1091, 1135, 1137
, 1172, 1331, 1374, 1418, 1435, 1447, 1471, 1505, 1521, 1540, 1616, 1702, 1768
, 1825, 1828, 1847, 1881, 2007, 2020, 2051, 2085, 2158, 2183, 2190, 2235, 2255
, 2264, 2275, 2325, 2333, 2334, 2337, 2341, 2343, 2348, 2370, 2372, 2376, 2606
, 2634, 2636, 2695, 2696)
AND r.include
ORDER BY r.series, r.volume DESC NULLS LAST, r.chapter DESC NULLS LAST;SELECT r.*
FROM unnest('{12, 17, 44, 79, 88, 110, 129}'::int[]) t(i) -- or many more items
, LATERAL (
SELECT volume AS releases_volume
, chapter AS releases_chapter
, series AS releases_series
FROM releases
WHERE series = t.i
AND include
ORDER BY series, volume DESC NULLS LAST, chapter DESC NULLS LAST
LIMIT 1
) r;CREATE INDEX releases_series_volume_chapter_idx
ON releases(series, volume DESC NULLS LAST, chapter DESC NULLS LAST);CREATE INDEX releases_series_volume_chapter_idx
ON releases(series, volume DESC NULLS LAST, chapter DESC NULLS LAST)
WHERE include;Context
StackExchange Database Administrators Q#114407, answer score: 3
Revisions (0)
No revisions yet.