HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Efficient query to get last row group by multiple columns

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
lastgroupcolumnsqueryefficientgetmultiplerow

Problem

I have a table like the following:

spread_id serial NOT NULL,
game_id integer NOT NULL,
sportsbook_id integer NOT NULL,
spread_type integer NOT NULL,
spread_duration integer NOT NULL,
home_line double precision,
home_odds integer,
away_line double precision,
away_odds integer,
update_time timestamp without time zone NOT NULL,
game_update_count integer NOT NULL


I'm trying to get the last row inserted (max update_time or game_update_count), for each group of sportsbook_id, spread_type, spread_duration, and game_id.

The following query gets me close, but I am not able to select the lines/odds without Postgres complaining.

SELECT 
     spreads.game_id, sportsbook_id, spread_type, spread_duration,
     MAX(game_update_count) AS game_update_count 
FROM spreads 
LEFT JOIN schedule ON 
        schedule.game_id = spreads.game_id 

WHERE date >= '2012-01-01' AND date <= '2012-01-02' 
GROUP BY 
     spreads.game_id, sportsbook_id, spread_type, spread_duration 
ORDER BY 
    spread_duration, spread_type, sportsbook_id, spreads.game_id,
    game_update_count DESC;


Anyone have any thoughts on a better approach to take?

Solution

The simplest way in Postgres is with DISTINCT ON:

SELECT DISTINCT ON (1,2,3,4)
       sp.game_id, sportsbook_id, spread_type, spread_duration, game_update_count
FROM   spreads sp
LEFT   JOIN schedule sch USING (game_id)
WHERE  date >= '2012-01-01'
AND    date <= '2012-01-02'
ORDER  BY 4,3,2,1, game_update_count DESC;


Details:

  • Select first row in each GROUP BY group?



The numbers are just syntax shorthand referring to the ordinal position of SELECT items.

If game_update_count can be NULL, you'll want game_update_count DESC NULLS LAST.

  • PostgreSQL sort by datetime asc, null first?

Code Snippets

SELECT DISTINCT ON (1,2,3,4)
       sp.game_id, sportsbook_id, spread_type, spread_duration, game_update_count
FROM   spreads sp
LEFT   JOIN schedule sch USING (game_id)
WHERE  date >= '2012-01-01'
AND    date <= '2012-01-02'
ORDER  BY 4,3,2,1, game_update_count DESC;

Context

StackExchange Database Administrators Q#89776, answer score: 4

Revisions (0)

No revisions yet.