patternsqlModerate
Get the last 5 distinct values for each ID
Viewed 0 times
distinctlasttheeachgetforvalues
Problem
I'm working with PostgreSQL 9.4.
I have a table that contains the following entries:
What I want to achieve is a query that returns the 5 most recent unique postcode records for each id:
What would be the best way of achieving this? I've been playing around with subqueries but keep hitting walls when it comes to ordering them whilst doing a
I have a table that contains the following entries:
id | postcode | date_created
---+----------+-----------------
14 | al2 2qp | 2015-09-23 14:46:57
14 | al2 2qp | 2015-09-23 14:51:07
14 | sp2 8ag | 2015-09-23 14:56:11
14 | se4 | 2015-09-23 16:12:05
17 | e2 | 2015-09-23 16:15:35
17 | fk20 8ru | 2015-09-23 16:28:35
17 | fk20 8ru | 2015-09-23 16:35:51
17 | se2 | 2015-09-23 16:36:17
17 | fk20 8ru | 2015-09-23 16:36:22
17 | fk20 8ru | 2015-09-23 16:37:04
17 | se1 | 2015-09-23 16:37:11
17 | fk20 8ru | 2015-09-23 16:37:15
17 | se1 8ga | 2015-09-24 09:52:46
17 | se1 | 2015-09-24 10:01:19
17 | hp27 9rz | 2015-09-24 10:05:27
17 | hp27 9rz | 2015-09-24 10:05:29
17 | se1 | 2015-09-24 10:19:46
14 | tn21 8qb | 2015-09-24 14:49:05
14 | tn21 8qb | 2015-09-24 15:42:45
14 | tn21 8qb | 2015-09-24 17:38:06
14 | n4 1ny | 2015-09-25 14:49:10What I want to achieve is a query that returns the 5 most recent unique postcode records for each id:
id | postcode
---+---------
14 | n4 1ny
14 | tn21 8qb
14 | se4
14 | sp2 8ag
14 | al2 2qp
17 | se1
17 | hp27 9rz
17 | se1 8ga
17 | fk20 8ru
17 | se2What would be the best way of achieving this? I've been playing around with subqueries but keep hitting walls when it comes to ordering them whilst doing a
DISTINCT and GROUP BY.Solution
There are probably many ways to do this. The first that comes to mind is to use window functions:
Test at SQLfiddle.
If there are ties, say the 5th, 6th and 7th
Another option is to use the
Adding an index on
SELECT
id, postcode
FROM
( SELECT id, postcode,
ROW_NUMBER() OVER (PARTITION BY id
ORDER BY MAX(date_created) DESC
) AS rn
FROM tablename
GROUP BY id, postcode
) AS t
WHERE
rn <= 5
ORDER BY
id, rn ;Test at SQLfiddle.
If there are ties, say the 5th, 6th and 7th
postcode for an id have the same date_created, only one of them (choice will be arbitrary) will be in the results. If you want all the tied postcodes in those cases, use RANK() instead of ROW_NUMBER().Another option is to use the
LATERAL syntax. I'm not sure which will be more efficient, it wil probably depend on the values distribution of the two columns (id and postcode), i.e. how many distinct ids in the whole table, how many distinct postcodes per id and how many rows per (id, postcode) combinations.SELECT
t.id, ti.postcode
FROM
( SELECT DISTINCT id
FROM tablename
) AS t
CROSS JOIN LATERAL
( SELECT tt.postcode,
MAX(tt.date_created) AS date_created
FROM tablename AS tt
WHERE tt.id = t.id
GROUP BY tt.postcode
ORDER BY date_created DESC
LIMIT 5
) AS ti
ORDER BY
t.id, ti.date_created DESC;Adding an index on
(id, postcode, date_created) would be a good idea, too - or on (id, postcode, date_created DESC).Code Snippets
SELECT
id, postcode
FROM
( SELECT id, postcode,
ROW_NUMBER() OVER (PARTITION BY id
ORDER BY MAX(date_created) DESC
) AS rn
FROM tablename
GROUP BY id, postcode
) AS t
WHERE
rn <= 5
ORDER BY
id, rn ;SELECT
t.id, ti.postcode
FROM
( SELECT DISTINCT id
FROM tablename
) AS t
CROSS JOIN LATERAL
( SELECT tt.postcode,
MAX(tt.date_created) AS date_created
FROM tablename AS tt
WHERE tt.id = t.id
GROUP BY tt.postcode
ORDER BY date_created DESC
LIMIT 5
) AS ti
ORDER BY
t.id, ti.date_created DESC;Context
StackExchange Database Administrators Q#116142, answer score: 11
Revisions (0)
No revisions yet.