snippetsqlMinor
How can I correctly choose maximum number of occurrences of a string while grouping by another field?
Viewed 0 times
cannumbermaximumwhileoccurrencesgroupingfieldchooseanotherhow
Problem
I am using Postgresql 9.0. I have the following fields in a table:
For each
I tried with the following query but it doesn't work:
id, name.id name
1 John
1 Mary
1 Mary
1 Mary
1 John
1 Mary
3 Paul
3 Paul
3 George
. .
. .For each
id, I want to select the name that occurs the most. How can I do that?I tried with the following query but it doesn't work:
select id, max(name)
from table
group by id;Solution
This isn't trivial. First, you want group by id and name and count the rows:
Then select the maximum count for every id. One way to achieve this is by window functions. The
assigns a number to every row of the result (after the grouping is done), arranging them (the rows) in partitions with the same
The final query is like this:
Tested at SQL-Fiddle
Note that if you have ties in the first place (two or more names with the same maximum count), all these will be returned. If you want strictly one row per id in the final results, you have to use the
Tested: SQL-Fiddle test-2.
SELECT COUNT(*)
...
GROUP BY id, nameThen select the maximum count for every id. One way to achieve this is by window functions. The
RANK() function:RANK() OVER (PARTITION BY id ORDER BY COUNT(*) DESC)assigns a number to every row of the result (after the grouping is done), arranging them (the rows) in partitions with the same
id and ordered by COUNT(*) DESC, so for every (partition of) id, the row(s) with the maximum count are assigned a rank of 1. Thus we need to put the above in a derived table and use a WHERE condition to keep only these rows:WHERE rnk = 1The final query is like this:
SELECT
id, name, cnt
FROM
( SELECT id, name, COUNT(*) AS cnt,
RANK() OVER (PARTITION BY id ORDER BY COUNT(*) DESC) AS rnk
FROM tableX
GROUP BY id, name
) AS tg
WHERE
rnk = 1 ;Tested at SQL-Fiddle
Note that if you have ties in the first place (two or more names with the same maximum count), all these will be returned. If you want strictly one row per id in the final results, you have to use the
ROW_NUMBER() instead of the RANK() and possibly alter the ORDER BY clause to explicitly select how the ties will be resolved:ROW_NUMBER() OVER (PARTITION BY id ORDER BY COUNT(*) DESC, name ASC) AS rnkTested: SQL-Fiddle test-2.
Code Snippets
SELECT COUNT(*)
...
GROUP BY id, nameRANK() OVER (PARTITION BY id ORDER BY COUNT(*) DESC)WHERE rnk = 1SELECT
id, name, cnt
FROM
( SELECT id, name, COUNT(*) AS cnt,
RANK() OVER (PARTITION BY id ORDER BY COUNT(*) DESC) AS rnk
FROM tableX
GROUP BY id, name
) AS tg
WHERE
rnk = 1 ;ROW_NUMBER() OVER (PARTITION BY id ORDER BY COUNT(*) DESC, name ASC) AS rnkContext
StackExchange Database Administrators Q#30484, answer score: 9
Revisions (0)
No revisions yet.