HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

How to properly implement compound greatest-n filtering

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
compoundimplementgreatestproperlyhowfiltering

Problem

Yep, more greatest-n-per-group questions.

Given the a table releases with the following columns:

id         | primary key                 | 
 volume     | double precision            |
 chapter    | double precision            |
 series     | integer-foreign-key         |
 include    | boolean                     | not null


I want to select the compound max of volume, then chapter for a set of series.

Right now, if I query per-distinct-series, I can easily accomplish this as follows:

SELECT 
       releases.chapter AS releases_chapter,
       releases.include AS releases_include,
       releases.series AS releases_series
FROM releases
WHERE releases.series = 741
  AND releases.include = TRUE
ORDER BY releases.volume DESC NULLS LAST, releases.chapter DESC NULLS LAST LIMIT 1;


However, if I have a large set of series (and I do), this quickly runs into efficiency issues where I'm issuing 100+ queries to generate a single page.

I'd like to roll the whole thing into a single query, where I can simply say WHERE releases.series IN (1,2,3....), but I haven't figured out how to convince Postgres to let me do that.

The naive approach would be:

```
SELECT releases.volume AS releases_volume,
releases.chapter AS releases_chapter,
releases.series AS releases_series
FROM
releases
WHERE
releases.series IN (12, 17, 44, 79, 88, 110, 129, 133, 142, 160, 193, 231, 235, 295, 340, 484, 499,
556, 581, 664, 666, 701, 741, 780, 790, 796, 874, 930, 1066, 1091, 1135, 1137,
1172, 1331, 1374, 1418, 1435, 1447, 1471, 1505, 1521, 1540, 1616, 1702, 1768,
1825, 1828, 1847, 1881, 2007, 2020, 2051, 2085, 2158, 2183, 2190, 2235, 2255,
2264, 2275, 2325, 2333, 2334, 2337, 2341, 2343, 2348, 2370, 2372, 2376, 2606,
2634, 2636, 2695, 2696 )
AND releases.include = TRUE
GROUP BY
releases_series
ORDER BY releases.volume DESC N

Solution

The simple solution in Postgres is with DISTINCT ON:

SELECT DISTINCT ON (r.series)
       r.volume  AS releases_volume
     , r.chapter AS releases_chapter
     , r.series  AS releases_series
FROM   releases r
WHERE  r.series IN (
    12, 17, 44, 79, 88, 110, 129, 133, 142, 160, 193, 231, 235, 295, 340, 484, 499
  , 556, 581, 664, 666, 701, 741, 780, 790, 796, 874, 930, 1066, 1091, 1135, 1137
  , 1172, 1331, 1374, 1418, 1435, 1447, 1471, 1505, 1521, 1540, 1616, 1702, 1768
  , 1825, 1828, 1847, 1881, 2007, 2020, 2051, 2085, 2158, 2183, 2190, 2235, 2255
  , 2264, 2275, 2325, 2333, 2334, 2337, 2341, 2343, 2348, 2370, 2372, 2376, 2606
  , 2634, 2636, 2695, 2696)
AND    r.include
ORDER  BY r.series, r.volume DESC NULLS LAST, r.chapter DESC NULLS LAST;


Details:

  • Select first row in each GROUP BY group?



Depending on data distribution there may be faster techniques:

  • Optimize GROUP BY query to retrieve latest record per user



Also, there are faster alternatives for long lists than IN ().

Combining an unnested array with a LATERAL join:

SELECT r.*
FROM   unnest('{12, 17, 44, 79, 88, 110, 129}'::int[]) t(i)  -- or many more items
     , LATERAL (
   SELECT volume  AS releases_volume
        , chapter AS releases_chapter
        , series  AS releases_series
   FROM   releases
   WHERE  series = t.i 
   AND    include
   ORDER  BY series, volume DESC NULLS LAST, chapter DESC NULLS LAST
   LIMIT  1
   ) r;


Is often faster. For best performance you need a matching multicolumn index like:

CREATE INDEX releases_series_volume_chapter_idx
ON releases(series, volume DESC NULLS LAST, chapter DESC NULLS LAST);


Related:

  • Extremely slow query on indexed column



And if there are more than a few rows where include is not true, while you are only interested in the rows with include = true, then consider a partial multicolumn index:

CREATE INDEX releases_series_volume_chapter_idx
ON releases(series, volume DESC NULLS LAST, chapter DESC NULLS LAST)
WHERE include;

Code Snippets

SELECT DISTINCT ON (r.series)
       r.volume  AS releases_volume
     , r.chapter AS releases_chapter
     , r.series  AS releases_series
FROM   releases r
WHERE  r.series IN (
    12, 17, 44, 79, 88, 110, 129, 133, 142, 160, 193, 231, 235, 295, 340, 484, 499
  , 556, 581, 664, 666, 701, 741, 780, 790, 796, 874, 930, 1066, 1091, 1135, 1137
  , 1172, 1331, 1374, 1418, 1435, 1447, 1471, 1505, 1521, 1540, 1616, 1702, 1768
  , 1825, 1828, 1847, 1881, 2007, 2020, 2051, 2085, 2158, 2183, 2190, 2235, 2255
  , 2264, 2275, 2325, 2333, 2334, 2337, 2341, 2343, 2348, 2370, 2372, 2376, 2606
  , 2634, 2636, 2695, 2696)
AND    r.include
ORDER  BY r.series, r.volume DESC NULLS LAST, r.chapter DESC NULLS LAST;
SELECT r.*
FROM   unnest('{12, 17, 44, 79, 88, 110, 129}'::int[]) t(i)  -- or many more items
     , LATERAL (
   SELECT volume  AS releases_volume
        , chapter AS releases_chapter
        , series  AS releases_series
   FROM   releases
   WHERE  series = t.i 
   AND    include
   ORDER  BY series, volume DESC NULLS LAST, chapter DESC NULLS LAST
   LIMIT  1
   ) r;
CREATE INDEX releases_series_volume_chapter_idx
ON releases(series, volume DESC NULLS LAST, chapter DESC NULLS LAST);
CREATE INDEX releases_series_volume_chapter_idx
ON releases(series, volume DESC NULLS LAST, chapter DESC NULLS LAST)
WHERE include;

Context

StackExchange Database Administrators Q#114407, answer score: 3

Revisions (0)

No revisions yet.