HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Count rows with max per group and additional condition

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
rowsconditionpergroupwithmaxandcountadditional

Problem

I have the following table:

horse_main_id | race_id | horse_name | rating100 | race_result
---------------+---------+-------------------+-----------+-------------
23 | 159 | Angelology | 126 | 1
24 | 159 | Arwoc | 118 |
25 | 159 | Marlo Man | 118 |
26 | 159 | Hucknall | 113 |
27 | 159 | It's Electrifying | 110 | 2
28 | 159 | Monte Carlo | 107 |
29 | 159 | Cross Constance | 103 |
30 | 160 | Auld Burns | 119 |
31 | 160 | Diamond Jim | 117 |
33 | 160 | Livery | 115 |
34 | 160 | Paraggi | 115 |
35 | 160 | I'll'ava'alf | 110 | 2
36 | 160 | Jacks 'n' Kings | 108 |
32 | 160 | Fastnet Isle | 115 | 1

I would like to operate on groups of race_id, find all groups that have the max(rating100) for their group and race_result = 1, and count all the times this happens.

So this example should return only 1, as for race_id 159 group the max rating100 of 126 also has race_result = 1, but for race_id 160 group, the max rating100 is 119, but does not have race_result = 1.

This is the closest I've got (was told I should use window functions):

SELECT * FROM (
SELECT horse_name, race_id, race_result
, max(rating100) OVER (partition by race_id) AS max_rating
FROM horse_main) t
WHERE race_result = 1;

Solution

Since you need a subquery in either case, I would use a plain aggregate in the subquery (may be cheaper):

SELECT count(*)
FROM  (SELECT race_id, max(rating100) AS rating100
       FROM   horse_main
       GROUP  BY 1) x
JOIN   horse_main h USING (race_id, rating100)
WHERE  h.race_result = 1;


If there are many rows per race_id, it will be faster to get group-wise maxima with one of these techniques:

  • Optimize GROUP BY query to retrieve latest record per user



Or use the window function rank() - that's what you count effectively: all greatest rating100 per race_id with race_result = 1:

SELECT count(*)
FROM  (SELECT race_result, rank() OVER (PARTITION BY race_id ORDER BY rating100 DESC) AS rnk
       FROM   horse_main) x
WHERE  rnk = 1
AND    race_result = 1;


An index on (race_id, rating100 DESC, race_result DESC) should help performance with big tables.

Assuming rating100 is defined NOT NULL, else you need to add NULLS LAST in query and index.

Code Snippets

SELECT count(*)
FROM  (SELECT race_id, max(rating100) AS rating100
       FROM   horse_main
       GROUP  BY 1) x
JOIN   horse_main h USING (race_id, rating100)
WHERE  h.race_result = 1;
SELECT count(*)
FROM  (SELECT race_result, rank() OVER (PARTITION BY race_id ORDER BY rating100 DESC) AS rnk
       FROM   horse_main) x
WHERE  rnk = 1
AND    race_result = 1;

Context

StackExchange Database Administrators Q#139115, answer score: 4

Revisions (0)

No revisions yet.