patternsqlMinor
GROUP BY possible sequences
Viewed 0 times
possiblesequencesgroup
Problem
I have a table containing the following data, using Postgres 9.6:
log_id | sequence | made_at (timestamp)
206480 1 1
206480 1 2
206480 2 3
206480 3 4
206480 1 5
206480 2 6
206480 4 7
206480 5 8
206480 1 9
206480 2 10
206481 1 11
206481 2 12
206481 3 13
206481 4 14
I have to group and aggregate on the ID so I get an array of possible sequences. In the end I want the data to look like this:
log_id | sequence
206480 {1,1,2,3}
206480 {1,2,4,5}
206480 {1,2}
206481 {1,2,3,4}
I want a new row (with the sequences) when:
There is another column which specifies the ordering (a timestamp), but it's in another table (I join them and use that timestamp). I left it out to make things easier, but we can assume the column is called
log_id | sequence | made_at (timestamp)
206480 1 1
206480 1 2
206480 2 3
206480 3 4
206480 1 5
206480 2 6
206480 4 7
206480 5 8
206480 1 9
206480 2 10
206481 1 11
206481 2 12
206481 3 13
206481 4 14
I have to group and aggregate on the ID so I get an array of possible sequences. In the end I want the data to look like this:
log_id | sequence
206480 {1,1,2,3}
206480 {1,2,4,5}
206480 {1,2}
206481 {1,2,3,4}
I want a new row (with the sequences) when:
- the
log_idchanges; or
- the next sequence number is lower than the current sequence number.
There is another column which specifies the ordering (a timestamp), but it's in another table (I join them and use that timestamp). I left it out to make things easier, but we can assume the column is called
made_at.Solution
select log_id
,array_agg (sequence)
from (select log_id
,sequence
,count (is_restart) over
(
partition by log_id
order by made_at
) as restart_id
from (select made_at
,log_id
,sequence
,case
when sequence <
lag (sequence) over
(
partition by log_id
order by made_at
)
then 1
end is_restart
from logs
) l
) l
group by log_id
,restart_id
order by log_id
,restart_id
;+--------+-----------+
| log_id | array_agg |
+--------+-----------+
| 206480 | {1,1,2,3} |
+--------+-----------+
| 206480 | {1,2,4,5} |
+--------+-----------+
| 206480 | {1,2} |
+--------+-----------+
| 206481 | {1,2,3,4} |
+--------+-----------+
Walkthrough
-
Identify restarts by comparing current sequence to previous sequence (
LAG).select made_at
,log_id
,sequence
,case
when sequence <
lag (sequence) over
(
partition by log_id
order by made_at
)
then 1
end is_restart
from logs+---------+--------+----------+------------+
| made_at | log_id | sequence | is_restart |
+---------+--------+----------+------------+
| 1 | 206480 | 1 | |
+---------+--------+----------+------------+
| 2 | 206480 | 1 | |
+---------+--------+----------+------------+
| 3 | 206480 | 2 | |
+---------+--------+----------+------------+
| 4 | 206480 | 3 | |
+---------+--------+----------+------------+
| 5 | 206480 | 1 | 1 |
+---------+--------+----------+------------+
| 6 | 206480 | 2 | |
+---------+--------+----------+------------+
| 7 | 206480 | 4 | |
+---------+--------+----------+------------+
| 8 | 206480 | 5 | |
+---------+--------+----------+------------+
| 9 | 206480 | 1 | 1 |
+---------+--------+----------+------------+
| 10 | 206480 | 2 | |
+---------+--------+----------+------------+
| 11 | 206481 | 1 | |
+---------+--------+----------+------------+
| 12 | 206481 | 2 | |
+---------+--------+----------+------------+
| 13 | 206481 | 3 | |
+---------+--------+----------+------------+
| 14 | 206481 | 4 | |
+---------+--------+----------+------------+
-
Do "running counts" (similar to "Running totals") of restarts (
is_restart).Rows that belongs to the same group will have the same count (AKA
restart_id).The "Order by" in
COUNT implies range between unbounded preceding and current row select log_id
,sequence
,count (is_restart) over
(
partition by log_id
order by made_at
) as group_id
from (...) l+--------+----------+----------+
| log_id | sequence | group_id |
+--------+----------+----------+
| 206480 | 1 | 0 |
+--------+----------+----------+
| 206480 | 1 | 0 |
+--------+----------+----------+
| 206480 | 2 | 0 |
+--------+----------+----------+
| 206480 | 3 | 0 |
+--------+----------+----------+
| 206480 | 1 | 1 |
+--------+----------+----------+
| 206480 | 2 | 1 |
+--------+----------+----------+
| 206480 | 4 | 1 |
+--------+----------+----------+
| 206480 | 5 | 1 |
+--------+----------+----------+
| 206480 | 1 | 2 |
+--------+----------+----------+
| 206480 | 2 | 2 |
+--------+----------+----------+
| 206481 | 1 | 0 |
+--------+----------+----------+
| 206481 | 2 | 0 |
+--------+----------+----------+
| 206481 | 3 | 0 |
+--------+----------+----------+
| 206481 | 4 | 0 |
+--------+----------+----------+
-
Group by
log_id and restart_id and aggregate sequence```
select log_id
,array_agg (sequence)
from (...) l
group by log_id
Code Snippets
select log_id
,array_agg (sequence)
from (select log_id
,sequence
,count (is_restart) over
(
partition by log_id
order by made_at
) as restart_id
from (select made_at
,log_id
,sequence
,case
when sequence <
lag (sequence) over
(
partition by log_id
order by made_at
)
then 1
end is_restart
from logs
) l
) l
group by log_id
,restart_id
order by log_id
,restart_id
;select made_at
,log_id
,sequence
,case
when sequence <
lag (sequence) over
(
partition by log_id
order by made_at
)
then 1
end is_restart
from logsselect log_id
,sequence
,count (is_restart) over
(
partition by log_id
order by made_at
) as group_id
from (...) lselect log_id
,array_agg (sequence)
from (...) l
group by log_id
,restart_id
order by log_id
,restart_id
;Context
StackExchange Database Administrators Q#156893, answer score: 6
Revisions (0)
No revisions yet.