HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

GROUP BY possible sequences

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
possiblesequencesgroup

Problem

I have a table containing the following data, using Postgres 9.6:

log_id | sequence | made_at (timestamp)
206480 1 1
206480 1 2
206480 2 3
206480 3 4
206480 1 5
206480 2 6
206480 4 7
206480 5 8
206480 1 9
206480 2 10
206481 1 11
206481 2 12
206481 3 13
206481 4 14

I have to group and aggregate on the ID so I get an array of possible sequences. In the end I want the data to look like this:

log_id | sequence
206480 {1,1,2,3}
206480 {1,2,4,5}
206480 {1,2}
206481 {1,2,3,4}

I want a new row (with the sequences) when:

  • the log_id changes; or



  • the next sequence number is lower than the current sequence number.



There is another column which specifies the ordering (a timestamp), but it's in another table (I join them and use that timestamp). I left it out to make things easier, but we can assume the column is called made_at.

Solution

select      log_id
           ,array_agg (sequence)

from       (select      log_id 
                       ,sequence
                       ,count (is_restart) over
                        (
                            partition by    log_id 
                            order by        made_at
                        ) as restart_id

            from        (select      made_at
                                    ,log_id 
                                    ,sequence
                                    ,case 
                                         when sequence <
                                              lag (sequence) over
                                              (
                                                  partition by    log_id 
                                                  order by        made_at
                                              ) 
                                         then 1
                                     end            is_restart

                         from        logs
                         ) l
            ) l

group by    log_id      
           ,restart_id

order by    log_id      
           ,restart_id
;


+--------+-----------+
| log_id | array_agg |
+--------+-----------+
| 206480 | {1,1,2,3} |
+--------+-----------+
| 206480 | {1,2,4,5} |
+--------+-----------+
| 206480 | {1,2} |
+--------+-----------+
| 206481 | {1,2,3,4} |
+--------+-----------+


Walkthrough

-
Identify restarts by comparing current sequence to previous sequence (LAG).

select      made_at
           ,log_id 
           ,sequence

           ,case 
                when sequence <
                     lag (sequence) over
                     (
                         partition by    log_id 
                         order by        made_at
                     ) 
                then 1
            end            is_restart

from        logs


+---------+--------+----------+------------+
| made_at | log_id | sequence | is_restart |
+---------+--------+----------+------------+
| 1 | 206480 | 1 | |
+---------+--------+----------+------------+
| 2 | 206480 | 1 | |
+---------+--------+----------+------------+
| 3 | 206480 | 2 | |
+---------+--------+----------+------------+
| 4 | 206480 | 3 | |
+---------+--------+----------+------------+
| 5 | 206480 | 1 | 1 |
+---------+--------+----------+------------+
| 6 | 206480 | 2 | |
+---------+--------+----------+------------+
| 7 | 206480 | 4 | |
+---------+--------+----------+------------+
| 8 | 206480 | 5 | |
+---------+--------+----------+------------+
| 9 | 206480 | 1 | 1 |
+---------+--------+----------+------------+
| 10 | 206480 | 2 | |
+---------+--------+----------+------------+
| 11 | 206481 | 1 | |
+---------+--------+----------+------------+
| 12 | 206481 | 2 | |
+---------+--------+----------+------------+
| 13 | 206481 | 3 | |
+---------+--------+----------+------------+
| 14 | 206481 | 4 | |
+---------+--------+----------+------------+


-
Do "running counts" (similar to "Running totals") of restarts (is_restart).

Rows that belongs to the same group will have the same count (AKA restart_id).

The "Order by" in COUNT implies range between unbounded preceding and current row

select      log_id 
           ,sequence
           ,count (is_restart) over
            (
                partition by    log_id 
                order by        made_at
            ) as group_id

from        (...) l


+--------+----------+----------+
| log_id | sequence | group_id |
+--------+----------+----------+
| 206480 | 1 | 0 |
+--------+----------+----------+
| 206480 | 1 | 0 |
+--------+----------+----------+
| 206480 | 2 | 0 |
+--------+----------+----------+
| 206480 | 3 | 0 |
+--------+----------+----------+
| 206480 | 1 | 1 |
+--------+----------+----------+
| 206480 | 2 | 1 |
+--------+----------+----------+
| 206480 | 4 | 1 |
+--------+----------+----------+
| 206480 | 5 | 1 |
+--------+----------+----------+
| 206480 | 1 | 2 |
+--------+----------+----------+
| 206480 | 2 | 2 |
+--------+----------+----------+
| 206481 | 1 | 0 |
+--------+----------+----------+
| 206481 | 2 | 0 |
+--------+----------+----------+
| 206481 | 3 | 0 |
+--------+----------+----------+
| 206481 | 4 | 0 |
+--------+----------+----------+


-
Group by log_id and restart_id and aggregate sequence

```
select log_id
,array_agg (sequence)

from (...) l

group by log_id

Code Snippets

select      log_id
           ,array_agg (sequence)

from       (select      log_id 
                       ,sequence
                       ,count (is_restart) over
                        (
                            partition by    log_id 
                            order by        made_at
                        ) as restart_id

            from        (select      made_at
                                    ,log_id 
                                    ,sequence
                                    ,case 
                                         when sequence <
                                              lag (sequence) over
                                              (
                                                  partition by    log_id 
                                                  order by        made_at
                                              ) 
                                         then 1
                                     end            is_restart

                         from        logs
                         ) l
            ) l

group by    log_id      
           ,restart_id

order by    log_id      
           ,restart_id
;
select      made_at
           ,log_id 
           ,sequence

           ,case 
                when sequence <
                     lag (sequence) over
                     (
                         partition by    log_id 
                         order by        made_at
                     ) 
                then 1
            end            is_restart

from        logs
select      log_id 
           ,sequence
           ,count (is_restart) over
            (
                partition by    log_id 
                order by        made_at
            ) as group_id

from        (...) l
select      log_id
           ,array_agg (sequence)

from       (...) l

group by    log_id      
           ,restart_id

order by    log_id      
           ,restart_id
;

Context

StackExchange Database Administrators Q#156893, answer score: 6

Revisions (0)

No revisions yet.