patternsqlMinor
Are ORDER BY clause & ROW_NUMBER duplicating functionality when I need top n queries
Viewed 0 times
ordertopneedareduplicatingwhenrow_numberqueriesfunctionalityclause
Problem
I have a table that tracks
The query seems inefficient because it sorts the count first with
Update :
Adding data tables & SQL fiddle (actually violation column is redundant, simply an entry in discipline table means that there's been a violation. If I remove it, should the
Using PostgreSQL 9.3
violations for a student. I want to count the number of violations and select top 2 violators from each class. The query would look like thisSELECT *
FROM
( SELECT "people"."id", "name", "class",
ROW_NUMBER() OVER (PARTITION BY "class" ORDER BY COUNT("violation") DESC) AS "v"
FROM "people"
INNER JOIN "discipline" on ("discipline"."people_id" = "people"."id")
GROUP BY "people"."id", "name", "class"
) AS "v_table"
WHERE v < 3The query seems inefficient because it sorts the count first with
ORDER BY and then assigns a ROW_NUMBER. If I already have it sorted over a partition, how can I skip the ROW_NUMBER assignment & get the top 2. Update :
Adding data tables & SQL fiddle (actually violation column is redundant, simply an entry in discipline table means that there's been a violation. If I remove it, should the
Count be done on "discipline"."people_id"? like this )Discipline People
------------------------- --------------
id people_id violation id name class
1 1 True 1 Rob A
2 1 True 2 Jen B
3 2 True 3 Tom C
4 3 True 4 Ted A
5 4 True 5 Tim A
6 1 True ...
7 4 True
...Using PostgreSQL 9.3
Solution
First, to answer the questions implied in the comments, that the assignment of row numbers with the
This is needed because the COUNT numbers can be different for each partition (class). Since we want the 2 top numbers (for each class), we can't find a useful condition for that. With the row numbers, we can use the
In version 9.3, the
Tested at SQL-Fiddle
ROW_NUMBER() aggregate seems inefficient because we already have the COUNT(violation) numbers: This is needed because the COUNT numbers can be different for each partition (class). Since we want the 2 top numbers (for each class), we can't find a useful condition for that. With the row numbers, we can use the
WHERE v < 3 which gives us the top 2.In version 9.3, the
LATERAL joins were added in Postgres, which are similar to the CROSS and OUTER APPLY of SQL-Server. With this new kind of join, you can write a query that uses the COUNT numbers and a TOP 2 for each partition. Whether it is more or less efficient, you can test:WITH classes AS
( SELECT DISTINCT class
FROM people
)
-- if you have a "classes" table, skip the above lines
SELECT
v.id, v.name, c.class,
v.violations
FROM
classes AS c,
LATERAL
( SELECT p.id, p.name,
COUNT(d.violation) AS violations
FROM people AS p
INNER JOIN discipline AS d
ON d.people_id = p.id
WHERE p.class = c.class
GROUP BY p.id, p.name
ORDER BY violations DESC
LIMIT 2
) AS v
ORDER BY
c.class, v.violations DESC ;Tested at SQL-Fiddle
Code Snippets
WITH classes AS
( SELECT DISTINCT class
FROM people
)
-- if you have a "classes" table, skip the above lines
SELECT
v.id, v.name, c.class,
v.violations
FROM
classes AS c,
LATERAL
( SELECT p.id, p.name,
COUNT(d.violation) AS violations
FROM people AS p
INNER JOIN discipline AS d
ON d.people_id = p.id
WHERE p.class = c.class
GROUP BY p.id, p.name
ORDER BY violations DESC
LIMIT 2
) AS v
ORDER BY
c.class, v.violations DESC ;Context
StackExchange Database Administrators Q#71299, answer score: 4
Revisions (0)
No revisions yet.