patternsqlModerate
Is there a way to SELECT n ON (like DISTINCT ON, but more than one of each)
Viewed 0 times
distincteachbutmorewaylikethanoneselectthere
Problem
I have a table
... and I want to SELECT a sample of
Is there a way to do that cleanly in PostgreSQL 9.3?
I can get one customer from each
But if I want, say, three customers from each state, is there a way I can do this without running the same query multiple times?
us_customers that looks like this (with hundreds of thousands of rows):+----------+----------+
| id | us_state |
+----------+----------+
| 12345678 | MA |
| 23456781 | AL |
| 34567812 | GA |
| 45678123 | FL |
| 56781234 | AZ |
| 67812345 | MA |
| 78123456 | CO |
| 81234567 | FL |
+----------+----------+... and I want to SELECT a sample of
n customers from each us_state.Is there a way to do that cleanly in PostgreSQL 9.3?
I can get one customer from each
us_state easily with:SELECT DISTINCT ON (us_state) id
FROM us_customers
ORDER BY us_state;But if I want, say, three customers from each state, is there a way I can do this without running the same query multiple times?
Solution
You can number and order
Or you can CROSS JOIN with a subquery:
Output with my small sample:
Note that n is the result of ROW_NUMBER and doesn't exist in the second query.
On a big table, an index on the partition (us-state) and order (id here) columns will help.
Sample used:
id by us_state using the ROW_NUMBER() Window Function and only keep the n first values:SELECT *
FROM (
SELECT *
, ROW_NUMBER() OVER(PARTITION BY us_state ORDER BY id) as n
FROM data
) as ord
WHERE n <= 2
ORDER BY us_state
;Or you can CROSS JOIN with a subquery:
SELECT l.*
FROM (
SELECT DISTINCT us_state FROM data
) as s
CROSS JOIN LATERAL (
SELECT *
FROM data d
WHERE d.us_state = s.us_state
ORDER BY id
LIMIT 2
) as l
ORDER BY l.us_state
;- Sample SQL Fiddle here
- I used a small sample of 1 to 3 rows per state. Therefore I limit only to 2 values
- I ordered it by
idsbut you can change that and order by whatever works best for you
Output with my small sample:
id | us_state | n
123 | AL | 1
456 | AL | 2
56781234 | AZ | 1
78123456 | CO | 1
45678123 | FL | 1
81234567 | FL | 2
34567812 | GA | 1
123 | MA | 1
456 | MA | 2Note that n is the result of ROW_NUMBER and doesn't exist in the second query.
On a big table, an index on the partition (us-state) and order (id here) columns will help.
Sample used:
CREATE TABLE data
("id" int, "us_state" varchar(2))
;
INSERT INTO data
("id", "us_state")
VALUES
(12345678, 'MA'),
(123, 'MA'),
(456, 'MA'),
(23456781, 'AL'),
(123, 'AL'),
(456, 'AL'),
(34567812, 'GA'),
(45678123, 'FL'),
(56781234, 'AZ'),
(67812345, 'MA'),
(78123456, 'CO'),
(81234567, 'FL')
;Code Snippets
SELECT *
FROM (
SELECT *
, ROW_NUMBER() OVER(PARTITION BY us_state ORDER BY id) as n
FROM data
) as ord
WHERE n <= 2
ORDER BY us_state
;SELECT l.*
FROM (
SELECT DISTINCT us_state FROM data
) as s
CROSS JOIN LATERAL (
SELECT *
FROM data d
WHERE d.us_state = s.us_state
ORDER BY id
LIMIT 2
) as l
ORDER BY l.us_state
;id | us_state | n
123 | AL | 1
456 | AL | 2
56781234 | AZ | 1
78123456 | CO | 1
45678123 | FL | 1
81234567 | FL | 2
34567812 | GA | 1
123 | MA | 1
456 | MA | 2CREATE TABLE data
("id" int, "us_state" varchar(2))
;
INSERT INTO data
("id", "us_state")
VALUES
(12345678, 'MA'),
(123, 'MA'),
(456, 'MA'),
(23456781, 'AL'),
(123, 'AL'),
(456, 'AL'),
(34567812, 'GA'),
(45678123, 'FL'),
(56781234, 'AZ'),
(67812345, 'MA'),
(78123456, 'CO'),
(81234567, 'FL')
;Context
StackExchange Database Administrators Q#133393, answer score: 10
Revisions (0)
No revisions yet.