patternsqlMinor
Grouping rows in ranges
Viewed 0 times
rangesrowsgrouping
Problem
Let's say I have the following (contrived) table of monitoring data:
I'd like to group rows with the same status and error, but only if they fall next to each other in sequence by date or by ID (ID will be ordered correctly, but is not guaranteed to be sequential), as follows:
A simple
Since that combines all rows with the same status/error, regardless of when they occur:
```
Starting Ending Polls Success Error
----------------- ----------------- ----- ------- -----------
5/24/2019 11:23am 5/24/2019 11:32am 6 1 None
5/24/2019 11:26am 5/2
RowID DateStamp Success Error
----- ----------------- ------- -----------
1001 5/24/2019 11:23am 1 None
1004 5/24/2019 11:24am 1 None
1005 5/24/2019 11:25am 1 None
1009 5/24/2019 11:26am 0 SQL Timeout
1018 5/24/2019 11:27am 0 SQL Timeout
1019 5/24/2019 11:28am 1 None
1026 5/24/2019 11:29am 1 None
1035 5/24/2019 11:30am 0 Planned Maintenance
1100 5/24/2019 11:31am 0 Planned Maintenance
1111 5/24/2019 11:32am 1 NoneI'd like to group rows with the same status and error, but only if they fall next to each other in sequence by date or by ID (ID will be ordered correctly, but is not guaranteed to be sequential), as follows:
Starting Ending Polls Success Error
----------------- ----------------- ----- ------- -----------
5/24/2019 11:23am 5/24/2019 11:25am 3 1 None
5/24/2019 11:26am 5/24/2019 11:27am 2 0 SQL Timeout
5/24/2019 11:28am 5/24/2019 11:29am 2 1 None
5/24/2019 11:30am 5/24/2019 11:31am 2 0 Planned Maintenance
5/24/2019 11:32am 5/24/2019 11:32am 1 1 NoneA simple
GROUP BY doesn't work:SELECT MIN(DateStamp) as Starting, MAX(DateStamp) as Ending,
Count(*) as Polls, Success, Error
FROM myTable
GROUP BY Success, Error
ORDER BY StartingSince that combines all rows with the same status/error, regardless of when they occur:
```
Starting Ending Polls Success Error
----------------- ----------------- ----- ------- -----------
5/24/2019 11:23am 5/24/2019 11:32am 6 1 None
5/24/2019 11:26am 5/2
Solution
An alternative way to identify the islands is using the
The
The
The final step is to just group by
You can use this live demo at db<>fiddle.uk for playing with this solution.
LAG analytic function and the SUM() window aggregate function:WITH
sentinels AS
(
SELECT
DateStamp,
Success,
Error,
Grp = CASE
WHEN Success = LAG(Success) OVER (ORDER BY DateStamp ASC)
AND Error = LAG(Error ) OVER (ORDER BY DateStamp ASC)
THEN 0
ELSE 1
END
FROM
dbo.myTable
),
fullyMarked AS
(
SELECT
DateStamp,
Success,
Error,
Grp = SUM(Grp) OVER (ORDER BY DateStamp ASC ROWS UNBOUNDED PRECEDING)
FROM
sentinels
)
SELECT
Starting = MIN(DateStamp),
Ending = MAX(DateStamp),
Polls = COUNT(*),
Success,
Error
FROM
fullyMarked
GROUP BY
Success,
Error,
Grp
ORDER BY
Starting ASC
;The
sentinels CTE marks the first row of each island with a 1 and the others with a 0:RowID DateStamp Success Error Grp
----- ----------------- ------- ------------------- ---
1001 5/24/2019 11:23am 1 None 1
1004 5/24/2019 11:24am 1 None 0
1005 5/24/2019 11:25am 1 None 0
1009 5/24/2019 11:26am 0 SQL Timeout 1
1018 5/24/2019 11:27am 0 SQL Timeout 0
1019 5/24/2019 11:28am 1 None 1
1026 5/24/2019 11:29am 1 None 0
1035 5/24/2019 11:30am 0 Planned Maintenance 1
1100 5/24/2019 11:31am 0 Planned Maintenance 0
1111 5/24/2019 11:32am 1 None 1
The
fullyMarked CTE takes the output of sentinels and calculates the running total of the Grp column, thus effectively assigning a unique ID to each island:RowID DateStamp Success Error Grp
----- ----------------- ------- ------------------- ---
1001 5/24/2019 11:23am 1 None 1 -- 1
1004 5/24/2019 11:24am 1 None 1 -- 1+0
1005 5/24/2019 11:25am 1 None 1 -- 1+0+0
1009 5/24/2019 11:26am 0 SQL Timeout 2 -- 1+0+0+1
1018 5/24/2019 11:27am 0 SQL Timeout 2 -- 1+0+0+1+0
1019 5/24/2019 11:28am 1 None 3 -- 1+0+0+1+0+1
1026 5/24/2019 11:29am 1 None 3 -- 1+0+0+1+0+1+0
1035 5/24/2019 11:30am 0 Planned Maintenance 4 -- 1+0+0+1+0+1+0+1
1100 5/24/2019 11:31am 0 Planned Maintenance 4 -- 1+0+0+1+0+1+0+1+0
1111 5/24/2019 11:32am 1 None 5 -- 1+0+0+1+0+1+0+1+0+1
The final step is to just group by
Success, Error, Grp and aggregate data of other columns as required, which gives you the expected result.You can use this live demo at db<>fiddle.uk for playing with this solution.
Code Snippets
WITH
sentinels AS
(
SELECT
DateStamp,
Success,
Error,
Grp = CASE
WHEN Success = LAG(Success) OVER (ORDER BY DateStamp ASC)
AND Error = LAG(Error ) OVER (ORDER BY DateStamp ASC)
THEN 0
ELSE 1
END
FROM
dbo.myTable
),
fullyMarked AS
(
SELECT
DateStamp,
Success,
Error,
Grp = SUM(Grp) OVER (ORDER BY DateStamp ASC ROWS UNBOUNDED PRECEDING)
FROM
sentinels
)
SELECT
Starting = MIN(DateStamp),
Ending = MAX(DateStamp),
Polls = COUNT(*),
Success,
Error
FROM
fullyMarked
GROUP BY
Success,
Error,
Grp
ORDER BY
Starting ASC
;Context
StackExchange Database Administrators Q#239002, answer score: 3
Revisions (0)
No revisions yet.