patternsqlMinor
Grouping Subsets of Rows with Null Values within an Ordered Set
Viewed 0 times
rowsgroupingwithnullsetsubsetswithinvaluesordered
Problem
Lets say we have an table where each row is a day, and it is ordered by this day column. Then we have left joined a membership data set showing which day the members were active (and not).
Lets say our current data set looks like this... Membership was active from day 3-5, inactive from 5-8, and active from day 9 onward etc.
...so
With this data structure, I would like to get to a "collapsed" set, showing "spans" of time inactive/active:
I have tried using row_number() to somehow partition out the subsets of a certain status, but in this case, using
How can I distinguish the spans of inactive membership from one-another for grouping purposes?
What technique can I use to achieve that output above?
Here is the script to generate the dummy source data:
```
CREATE TABLE ##SRC (ID INT, D DATE, MEMBER INT, ACTIVE DATE);
INSERT INTO ##SRC (ID, D, MEMBER, ACTIVE)
SELECT 1, '2017-01-01', 123, NULL UNION
SELECT 2, '2017-01-02', 123, NULL UNION
SELECT 3, '2017-01-03', 123, '2017-01-03' UNION
SELECT 4, '2017-01-04', 123, '2017-01-04' UNION
SELECT 5, '2017-01-05', 123, '2017-01-05' UNION
SELECT 6, '2017-01-06', 123, NULL UNION
SELECT 7, '2017-01-
Lets say our current data set looks like this... Membership was active from day 3-5, inactive from 5-8, and active from day 9 onward etc.
DAY DATE MEMBER ACTIVE
1 2017-01-01 123 null
2 2017-01-02 123 null
3 2017-01-03 123 2017-01-03
4 2017-01-04 123 2017-01-04
5 2017-01-05 123 2017-01-05
6 2017-01-06 123 null
7 2017-01-07 123 null
8 2017-01-08 123 null
9 2017-01-09 123 2017-01-09
10 2017-01-10 123 2017-01-10...so
ACTIVE=null means membership was not active on those days.With this data structure, I would like to get to a "collapsed" set, showing "spans" of time inactive/active:
MEMBER MIN(DATE) MAX(DATE) STATUS
123, 2017-01-01, 2017-01-02 INACTIVE
123, 2017-01-03, 2017-01-05 ACTIVE
123, 2017-01-06, 2017-01-08 INACTIVE
123, 2017-01-09, 2017-01-10 ACTIVEI have tried using row_number() to somehow partition out the subsets of a certain status, but in this case, using
min()/max() over the rows where ACTIVE is null, treats those as a single group, when in reality, there are several distinct spans of "inactive membership". How can I distinguish the spans of inactive membership from one-another for grouping purposes?
What technique can I use to achieve that output above?
Here is the script to generate the dummy source data:
```
CREATE TABLE ##SRC (ID INT, D DATE, MEMBER INT, ACTIVE DATE);
INSERT INTO ##SRC (ID, D, MEMBER, ACTIVE)
SELECT 1, '2017-01-01', 123, NULL UNION
SELECT 2, '2017-01-02', 123, NULL UNION
SELECT 3, '2017-01-03', 123, '2017-01-03' UNION
SELECT 4, '2017-01-04', 123, '2017-01-04' UNION
SELECT 5, '2017-01-05', 123, '2017-01-05' UNION
SELECT 6, '2017-01-06', 123, NULL UNION
SELECT 7, '2017-01-
Solution
Your sample data does not match your description and confused me at first. As sp_BlitzErik points out this is an island-and-gap problem. The solution is pretty straightforward if you have access to window functions. First, we can enumerate the table per member alone, let's call this full_order (this happens to be the same as day, but I'll ad it for generality). Second, we can enumerate the table per member and whether they were active on that day, let's call this partial_order
If the difference between full_order and partial_order changes, it means that active has changed from null to a value, or vice versa. Therefore we can form a group with this difference. Within each such group we can pick the min(active) and max(active) to form an interval:
It's probably easiest to add another level of nesting to get the desired result:
select day, active, date, member
, row_number() over (partition by member
order by day) as fullorder
, row_number() over (partition by member
,case when active is null then 0 else 1 end
order by day) as partialorder
from src
DAY ACTIVE MEMBER FULLORDER PARTIALORDER
----------- ---------- ----------- -------------------- --------------------
1 - 123 1 1
2 - 123 2 2
3 01/03/2017 123 3 1
4 01/04/2017 123 4 2
5 01/05/2017 123 5 3
6 - 123 6 3
7 - 123 7 4
8 - 123 8 5
9 01/09/2017 123 9 4
10 01/10/2017 123 10 5If the difference between full_order and partial_order changes, it means that active has changed from null to a value, or vice versa. Therefore we can form a group with this difference. Within each such group we can pick the min(active) and max(active) to form an interval:
select member, grp, min(date), max(active)
from (
select day, active, date, member
, row_number() over (partition by member order by day)
- row_number() over (partition by member
,case when active is null then 0 else 1 end
order by day) as grp
from src
)
group by member, grp
MEMBER GRP 3 4
----------- -------------------- ---------- ----------
123 0 01/01/2017 -
123 2 01/03/2017 01/05/2017
123 3 01/05/2017 -
123 5 01/08/2017 01/10/2017It's probably easiest to add another level of nesting to get the desired result:
select member, min_active
, coalesce(max_active, min_active) as max_active
, case when max_active is null then 'INACTIVE' else 'ACTIVE' end as status
from (
select member, grp, min(date) as min_active, max(active) as max_active
from (
select day, active, date, member
, row_number() over (partition by member order by day)
- row_number() over (partition by member
,case when active is null then 0 else 1 end
order by day) as grp
from src
)
group by member, grp)
MEMBER MIN_ACTIVE MAX_ACTIVE STATUS
----------- ---------- ---------- --------
123 01/01/2017 01/01/2017 INACTIVE
123 01/03/2017 01/05/2017 ACTIVE
123 01/05/2017 01/05/2017 INACTIVE
123 01/08/2017 01/10/2017 ACTIVECode Snippets
select day, active, date, member
, row_number() over (partition by member
order by day) as fullorder
, row_number() over (partition by member
,case when active is null then 0 else 1 end
order by day) as partialorder
from src
DAY ACTIVE MEMBER FULLORDER PARTIALORDER
----------- ---------- ----------- -------------------- --------------------
1 - 123 1 1
2 - 123 2 2
3 01/03/2017 123 3 1
4 01/04/2017 123 4 2
5 01/05/2017 123 5 3
6 - 123 6 3
7 - 123 7 4
8 - 123 8 5
9 01/09/2017 123 9 4
10 01/10/2017 123 10 5select member, grp, min(date), max(active)
from (
select day, active, date, member
, row_number() over (partition by member order by day)
- row_number() over (partition by member
,case when active is null then 0 else 1 end
order by day) as grp
from src
)
group by member, grp
MEMBER GRP 3 4
----------- -------------------- ---------- ----------
123 0 01/01/2017 -
123 2 01/03/2017 01/05/2017
123 3 01/05/2017 -
123 5 01/08/2017 01/10/2017select member, min_active
, coalesce(max_active, min_active) as max_active
, case when max_active is null then 'INACTIVE' else 'ACTIVE' end as status
from (
select member, grp, min(date) as min_active, max(active) as max_active
from (
select day, active, date, member
, row_number() over (partition by member order by day)
- row_number() over (partition by member
,case when active is null then 0 else 1 end
order by day) as grp
from src
)
group by member, grp)
MEMBER MIN_ACTIVE MAX_ACTIVE STATUS
----------- ---------- ---------- --------
123 01/01/2017 01/01/2017 INACTIVE
123 01/03/2017 01/05/2017 ACTIVE
123 01/05/2017 01/05/2017 INACTIVE
123 01/08/2017 01/10/2017 ACTIVEContext
StackExchange Database Administrators Q#191005, answer score: 3
Revisions (0)
No revisions yet.