patternsqlMinor
Row number with reset in PostgreSQL
Viewed 0 times
postgresqlnumberwithrowreset
Problem
I have this data set in PostgreSQL:
User_id | date | is_cleared |
---------------+-------------------------+------------+
C | 2016-08-03 18:28:38.37 | 1 |
A | 2016-08-01 19:59:36.548 | 0 |
A | 2016-08-01 23:38:29.912 | 0 |
A | 2016-08-04 19:03:30.585 | 0 |
A | 2016-08-08 19:53:20.251 | 0 |
A | 2016-08-11 21:00:16.075 | 0 |
A | 2016-08-13 21:07:01.336 | 0 |
A | 2016-08-15 10:17:32.746 | 0 |
A | 2016-08-17 11:54:16.75 | 0 |
A | 2016-08-18 21:12:22.717 | 0 |
A | 2016-08-21 19:59:26.441 | 0 |
A | 2016-08-22 17:19:42.215 | 0 |
A | 2016-08-24 20:18:45.516 | 1 |
A | 2016-08-27 13:22:33.98 | 1 |
A | 2016-08-31 16:09:25.756 | 0 |
B | 2016-08-03 21:10:32.945 | 1 |
B | 2016-08-05 20:40:55.644 | 1 |
B | 2016-08-06 10:59:00.796 | 1 |
B | 2016-08-09 22:36:50.674 | 0 |
B | 2016-08-11 09:37:21.29 | 1 |
B | 2016-08-13 10:15:15.858 | 1 |
B | 2016-08-26 09:56:48.937 | 1 |
B | 2016-08-29 09:34:13.96 | 1 |
And I need an output like this:
User_id | date | is_cleared | custom_rank
---------------+-------------------------+------------+---------
C | 2016-08-03 18:28:38.37 | 1 | 1
A | 2016-08-01 19:59:36.548 | 0 | 1
A | 2016-08-01 23:38:29.912 | 0 | 2
A | 2016-08-04 19:03:30.585 | 0 | 3
A | 2016-08-08 19:53:20.251 | 0 | 4
A | 2016-08-11 21:00:16.075 | 0 | 5
A | 2016-08-13 21:07:01.3
User_id | date | is_cleared |
---------------+-------------------------+------------+
C | 2016-08-03 18:28:38.37 | 1 |
A | 2016-08-01 19:59:36.548 | 0 |
A | 2016-08-01 23:38:29.912 | 0 |
A | 2016-08-04 19:03:30.585 | 0 |
A | 2016-08-08 19:53:20.251 | 0 |
A | 2016-08-11 21:00:16.075 | 0 |
A | 2016-08-13 21:07:01.336 | 0 |
A | 2016-08-15 10:17:32.746 | 0 |
A | 2016-08-17 11:54:16.75 | 0 |
A | 2016-08-18 21:12:22.717 | 0 |
A | 2016-08-21 19:59:26.441 | 0 |
A | 2016-08-22 17:19:42.215 | 0 |
A | 2016-08-24 20:18:45.516 | 1 |
A | 2016-08-27 13:22:33.98 | 1 |
A | 2016-08-31 16:09:25.756 | 0 |
B | 2016-08-03 21:10:32.945 | 1 |
B | 2016-08-05 20:40:55.644 | 1 |
B | 2016-08-06 10:59:00.796 | 1 |
B | 2016-08-09 22:36:50.674 | 0 |
B | 2016-08-11 09:37:21.29 | 1 |
B | 2016-08-13 10:15:15.858 | 1 |
B | 2016-08-26 09:56:48.937 | 1 |
B | 2016-08-29 09:34:13.96 | 1 |
And I need an output like this:
User_id | date | is_cleared | custom_rank
---------------+-------------------------+------------+---------
C | 2016-08-03 18:28:38.37 | 1 | 1
A | 2016-08-01 19:59:36.548 | 0 | 1
A | 2016-08-01 23:38:29.912 | 0 | 2
A | 2016-08-04 19:03:30.585 | 0 | 3
A | 2016-08-08 19:53:20.251 | 0 | 4
A | 2016-08-11 21:00:16.075 | 0 | 5
A | 2016-08-13 21:07:01.3
Solution
Window functions and a single subquery:
The special difficulty here: each partition ends with
Assuming all involved columns to be
Basics:
Asides:
SELECT user_id, date, is_cleared
, count(*) OVER (PARTITION BY user_id, grp ORDER BY date) AS custom_rank
FROM (
SELECT user_id, date, is_cleared
, count(is_cleared = 1 OR NULL) OVER (PARTITION BY user_id ORDER BY date DESC) AS grp
FROM tbl
) t
ORDER BY user_id, date;The special difficulty here: each partition ends with
1. In most similar cases, each partition would start with 1. The simple solution is to count occurrences of 1 in descending order. This way each partition includes the next 1 if it exists. Voilá.Assuming all involved columns to be
NOT NULL and (user_id, date) unique.Basics:
- Select longest continuous sequence
Asides:
- The first row of
Bgetscustom_rank = 1. Not2.
- The column
is_clearedshould really beboolean.
- A
timestampcolumn shouldn't be called "date".
Code Snippets
SELECT user_id, date, is_cleared
, count(*) OVER (PARTITION BY user_id, grp ORDER BY date) AS custom_rank
FROM (
SELECT user_id, date, is_cleared
, count(is_cleared = 1 OR NULL) OVER (PARTITION BY user_id ORDER BY date DESC) AS grp
FROM tbl
) t
ORDER BY user_id, date;Context
StackExchange Database Administrators Q#149519, answer score: 2
Revisions (0)
No revisions yet.