patternsqlMinor
Finding gaps on intersection dates
Viewed 0 times
datesintersectiongapsfinding
Problem
This is a problem I've been trying to work through on and off for the past month, and haven't made any progress. Essentially, I am working with a HR system that stores two major types of position for employees: Primary (P) and Acting (A). The position types aren't hugely relevant, but are important to know to understand the way the data is structured. An employee will always be in a Primary position, but may be put into an Acting position temporarily while it is vacant, or while the Primary person for that role is on leave.
The way that this is represented in the system is diagrammed below (as an example and not to scale):
Due to a set of data I'm trying to join the position information with, I need it to be intersected to look like this, where the Acting position always takes precedence over the Primary position.
Below is a representation of the data directly from the database, to give an idea of what it looks like overall:
```
Employee Position Start Date End Date Position Type
1 30 2016-12-10 2016-12-11 P
1 30 2016-12-12 2017-12-08 P
1 20 2017-01-31 2017-02-02 A
1 20 2017-03-29 2017-03-31 A
1 20 2017-04-19 2017-04-21 A
1 20 2017-06-22 2017-06-23 A
1 20 2017-06-29 2017-06
The way that this is represented in the system is diagrammed below (as an example and not to scale):
P (2018-07-28 - 9999-99-99)
A (2018-09-04 - 2018-09-05)
A (2018-11-12 - 2018-11-13)
A (2018-11-19 - 2018-11-20) Due to a set of data I'm trying to join the position information with, I need it to be intersected to look like this, where the Acting position always takes precedence over the Primary position.
P 2018-07-28 - 2018-09-03
A 2018-09-04 - 2018-09-05
P 2018-09-06 - 2018-11-11
A 2018-11-12 - 2018-11-13
P 2018-11-14 - 2018-11-18
A 2018-11-19 - 2018-11-20
P 2018-11-21 - 9999-99-99
Below is a representation of the data directly from the database, to give an idea of what it looks like overall:
```
Employee Position Start Date End Date Position Type
1 30 2016-12-10 2016-12-11 P
1 30 2016-12-12 2017-12-08 P
1 20 2017-01-31 2017-02-02 A
1 20 2017-03-29 2017-03-31 A
1 20 2017-04-19 2017-04-21 A
1 20 2017-06-22 2017-06-23 A
1 20 2017-06-29 2017-06
Solution
First, I think the link you had supplied doesn't fit for your question because it is about combining overlapped dates, and you need to generate (GAPs) new dates.
I would divide your data in two groups, those corresponding to Primary(s) and those that belong to Acting(s).
Employee | Position | StartDate | EndDate | PosType | P1
-------: | -------: | :----------| :----------| :------ | :-
1 | 30 | 10/12/2016 | 11/12/2016 | P | 1
1 | 30 | 12/12/2016 | 08/12/2017 | P | 2
1 | 30 | 09/12/2017 | 26/07/2018 | P | 3
1 | 30 | 27/07/2018 | 27/07/2018 | P | 4
1 | 30 | 28/07/2018 | 31/12/9999 | P | 5
Next Acting rows are subsumed into the second Principal row:
Employee | Position | StartDate | EndDate | PosType | P2
-------: | -------: | :----------| :----------| :------ | :--
1 | 20 | 31/01/2017 | 02/02/2017 | A | 1
1 | 20 | 29/03/2017 | 31/03/2017 | A | 2
1 | 20 | 19/04/2017 | 21/04/2017 | A | 3
1 | 20 | 22/06/2017 | 23/06/2017 | A | 4
1 | 20 | 29/06/2017 | 30/06/2017 | A | 5
1 | 20 | 03/10/2017 | 06/10/2017 | A | 6
Now let me generate a new dataset, for each Primary I'll add the corresponding Acting(s) using an OUTER JOIN, this outer join returns all Primary(s) even if they haven't subsumed Acting(s).
NOTE: I've used a temporary table plus and index, but you could try by using a CTE. You should check performance of both options.
This is the new scenario:
Employee | Position | StartDate | EndDate | PosType | P1 | P2 | P | PrevED | SD | ED | NextSD | PT
-------: | -------: | :----------| :----------| :------ | :- | :- | ---: | :----------| :----------| :----------| :----------| :---
1 | 30 | 10/12/2016 | 11/12/2016 | P | 1 | 0 | null | null | null | null | null | null
1 | 30 | 12/12/2016 | 08/12/2017 | P | 2 | 1 | 20 | null | 31/01/2017 | 02/02/2017 | 29/03/2017 | A
1 | 30 | 12/12/2016 | 08/12/2017 | P | 2 | 2 | 20 | 02/02/2017 | 29/03/2017 | 31/03/2017 | 19/04/2017 | A
1 | 30 | 12/12/2016 | 08/12/2017 | P | 2 | 3 | 20 | 31/03/2017 | 19/04/2017 | 21/04/2017 | 22/06/2017 | A
1 | 30 | 12/12/2016 | 08/12/2017 | P | 2 | 4 | 20 | 21/04/2017 | 22/06/2017 | 23/06/2017 | 29/06/2017 | A
1 | 30 | 12/12/2016 | 08/12/2017 | P | 2 | 5 | 20 | 23/06/2017 | 29/06/2017 | 30/06/2017 | 03/10/2017 | A
1 | 30 | 12/12/2016 | 08/12/2017 | P | 2 | 6 | 20 | 30/06/2017 | 03/10/2017 | 06/10/2017 | null | A
1 | 30 | 09/12/2017 | 26/07/2018 | P | 3 | 1 | 20 | null | 12/02/2018 | 02/03/2018 | 19/07/2018 | A
1 | 30 | 09/12/2017 | 26/07/2018 | P | 3 | 2 | 20 | 02/03/2018 | 19/07/2018 | 20/07/2018 | null | A
1 | 30 | 27/07/2018 | 27/07/2018 | P | 4 | 0 | null | null | null | null | null | null
1 | 30 | 28/07/2018 | 31/12/9999 | P | 5 | 1 | 20 | null | 04/09/2018 | 05/09/2018 | 12/11/2018 | A
1 | 30 | 28/07/2018 | 31/12/9999 | P | 5 | 2 | 20 | 05/09/2018 | 12/11/2018 | 13/11/2018 | 19/11/2018 | A
1 | 30 | 28/07/2018
I would divide your data in two groups, those corresponding to Primary(s) and those that belong to Acting(s).
SELECT t1.*,
ROW_NUMBER() OVER (PARTITION BY Employee ORDER BY Employee, StartDate) P1
FROM tbl t1
WHERE t1.PosType = 'P'
ORDER BY t1.Employee, t1.StartDate
;
GOEmployee | Position | StartDate | EndDate | PosType | P1
-------: | -------: | :----------| :----------| :------ | :-
1 | 30 | 10/12/2016 | 11/12/2016 | P | 1
1 | 30 | 12/12/2016 | 08/12/2017 | P | 2
1 | 30 | 09/12/2017 | 26/07/2018 | P | 3
1 | 30 | 27/07/2018 | 27/07/2018 | P | 4
1 | 30 | 28/07/2018 | 31/12/9999 | P | 5
Next Acting rows are subsumed into the second Principal row:
12/12/2016 | 08/12/2017SELECT t2.*,
ROW_NUMBER() OVER (ORDER BY t2.StartDate) P2
FROM tbl t2
WHERE t2.PosType = 'A'
AND t2.StartDate >= '20161212'
AND t2.StartDate <= '20171208'
ORDER BY t2.Employee, t2.StartDate
;
GOEmployee | Position | StartDate | EndDate | PosType | P2
-------: | -------: | :----------| :----------| :------ | :--
1 | 20 | 31/01/2017 | 02/02/2017 | A | 1
1 | 20 | 29/03/2017 | 31/03/2017 | A | 2
1 | 20 | 19/04/2017 | 21/04/2017 | A | 3
1 | 20 | 22/06/2017 | 23/06/2017 | A | 4
1 | 20 | 29/06/2017 | 30/06/2017 | A | 5
1 | 20 | 03/10/2017 | 06/10/2017 | A | 6
Now let me generate a new dataset, for each Primary I'll add the corresponding Acting(s) using an OUTER JOIN, this outer join returns all Primary(s) even if they haven't subsumed Acting(s).
SELECT t1.*, COALESCE(t3.P2, 0) P2, P, t3.PrevED, t3.SD, t3.ED, NextSD, t3.PT
INTO #mydata
FROM (SELECT *,
ROW_NUMBER() OVER (PARTITION BY Employee ORDER BY Employee, StartDate) as P1
FROM tbl
WHERE PosType = 'P'
) t1
OUTER APPLY (SELECT ROW_NUMBER() OVER (ORDER BY t2.StartDate) as P2,
LAG(t2.EndDate) OVER (ORDER BY t2.StartDate) as PrevED,
LEAD(t2.StartDate) OVER (ORDER BY t2.StartDate) as NextSD,
t2.Position as P, t2.StartDate as SD, t2.EndDate as ED, t2.PosType as PT
FROM tbl t2
WHERE t2.Employee = t1.Employee
AND t2.PosType = 'A'
AND t2.StartDate >= t1.StartDate
AND t2.StartDate <= t1.EndDate
) t3
WHERE t1.PosType = 'P'
ORDER BY t1.Employee, t1.StartDate, t3.SD
;
CREATE CLUSTERED INDEX PK_Mydata ON #mydata(Employee, P1, P2);
SELECT * FROM #mydata ORDER BY Employee, P1, P2;
GONOTE: I've used a temporary table plus and index, but you could try by using a CTE. You should check performance of both options.
This is the new scenario:
Employee | Position | StartDate | EndDate | PosType | P1 | P2 | P | PrevED | SD | ED | NextSD | PT
-------: | -------: | :----------| :----------| :------ | :- | :- | ---: | :----------| :----------| :----------| :----------| :---
1 | 30 | 10/12/2016 | 11/12/2016 | P | 1 | 0 | null | null | null | null | null | null
1 | 30 | 12/12/2016 | 08/12/2017 | P | 2 | 1 | 20 | null | 31/01/2017 | 02/02/2017 | 29/03/2017 | A
1 | 30 | 12/12/2016 | 08/12/2017 | P | 2 | 2 | 20 | 02/02/2017 | 29/03/2017 | 31/03/2017 | 19/04/2017 | A
1 | 30 | 12/12/2016 | 08/12/2017 | P | 2 | 3 | 20 | 31/03/2017 | 19/04/2017 | 21/04/2017 | 22/06/2017 | A
1 | 30 | 12/12/2016 | 08/12/2017 | P | 2 | 4 | 20 | 21/04/2017 | 22/06/2017 | 23/06/2017 | 29/06/2017 | A
1 | 30 | 12/12/2016 | 08/12/2017 | P | 2 | 5 | 20 | 23/06/2017 | 29/06/2017 | 30/06/2017 | 03/10/2017 | A
1 | 30 | 12/12/2016 | 08/12/2017 | P | 2 | 6 | 20 | 30/06/2017 | 03/10/2017 | 06/10/2017 | null | A
1 | 30 | 09/12/2017 | 26/07/2018 | P | 3 | 1 | 20 | null | 12/02/2018 | 02/03/2018 | 19/07/2018 | A
1 | 30 | 09/12/2017 | 26/07/2018 | P | 3 | 2 | 20 | 02/03/2018 | 19/07/2018 | 20/07/2018 | null | A
1 | 30 | 27/07/2018 | 27/07/2018 | P | 4 | 0 | null | null | null | null | null | null
1 | 30 | 28/07/2018 | 31/12/9999 | P | 5 | 1 | 20 | null | 04/09/2018 | 05/09/2018 | 12/11/2018 | A
1 | 30 | 28/07/2018 | 31/12/9999 | P | 5 | 2 | 20 | 05/09/2018 | 12/11/2018 | 13/11/2018 | 19/11/2018 | A
1 | 30 | 28/07/2018
Code Snippets
SELECT t1.*,
ROW_NUMBER() OVER (PARTITION BY Employee ORDER BY Employee, StartDate) P1
FROM tbl t1
WHERE t1.PosType = 'P'
ORDER BY t1.Employee, t1.StartDate
;
GOSELECT t2.*,
ROW_NUMBER() OVER (ORDER BY t2.StartDate) P2
FROM tbl t2
WHERE t2.PosType = 'A'
AND t2.StartDate >= '20161212'
AND t2.StartDate <= '20171208'
ORDER BY t2.Employee, t2.StartDate
;
GOSELECT t1.*, COALESCE(t3.P2, 0) P2, P, t3.PrevED, t3.SD, t3.ED, NextSD, t3.PT
INTO #mydata
FROM (SELECT *,
ROW_NUMBER() OVER (PARTITION BY Employee ORDER BY Employee, StartDate) as P1
FROM tbl
WHERE PosType = 'P'
) t1
OUTER APPLY (SELECT ROW_NUMBER() OVER (ORDER BY t2.StartDate) as P2,
LAG(t2.EndDate) OVER (ORDER BY t2.StartDate) as PrevED,
LEAD(t2.StartDate) OVER (ORDER BY t2.StartDate) as NextSD,
t2.Position as P, t2.StartDate as SD, t2.EndDate as ED, t2.PosType as PT
FROM tbl t2
WHERE t2.Employee = t1.Employee
AND t2.PosType = 'A'
AND t2.StartDate >= t1.StartDate
AND t2.StartDate <= t1.EndDate
) t3
WHERE t1.PosType = 'P'
ORDER BY t1.Employee, t1.StartDate, t3.SD
;
CREATE CLUSTERED INDEX PK_Mydata ON #mydata(Employee, P1, P2);
SELECT * FROM #mydata ORDER BY Employee, P1, P2;
GOSELECT *
FROM
(
-- Primary(s) without Acting(s)
SELECT Employee, Position, StartDate, EndDate, PosType
FROM #mydata
WHERE Employee = 1
AND P2 = 0
--
-- plus all Acting(s)
--
UNION
SELECT Employee, P, SD, ED, PT
FROM #mydata
WHERE Employee = 1
AND P2 <> 0
--
-- plus all (left) Gaps
--
UNION
SELECT Employee,
Position,
IIF(PrevED IS NULL, StartDate, DATEADD(day, 1, PrevED)),
DATEADD(day, -1, SD),
PosType
FROM #mydata
WHERE Employee = 1
AND P2 <> 0
AND DATEDIFF(day, IIF(PrevED IS NULL, StartDate, DATEADD(day, 1, PrevED)), SD) > 0
UNION
--
-- plus all (last right) Gap
--
SELECT Employee,
Position,
DATEADD(day, 1, ED),
EndDate,
PosType
FROM #mydata
WHERE Employee = 1
AND P2 <> 0
AND NextSD IS NULL
AND DATEDIFF(day, ED, EndDate) > 0
) T
ORDER BY 1, 3;
GOContext
StackExchange Database Administrators Q#224249, answer score: 3
Revisions (0)
No revisions yet.