HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Finding gaps on intersection dates

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
datesintersectiongapsfinding

Problem

This is a problem I've been trying to work through on and off for the past month, and haven't made any progress. Essentially, I am working with a HR system that stores two major types of position for employees: Primary (P) and Acting (A). The position types aren't hugely relevant, but are important to know to understand the way the data is structured. An employee will always be in a Primary position, but may be put into an Acting position temporarily while it is vacant, or while the Primary person for that role is on leave.

The way that this is represented in the system is diagrammed below (as an example and not to scale):

P (2018-07-28 - 9999-99-99) 
A (2018-09-04 - 2018-09-05)             
A (2018-11-12 - 2018-11-13)                                          
A (2018-11-19 - 2018-11-20)                                                  


Due to a set of data I'm trying to join the position information with, I need it to be intersected to look like this, where the Acting position always takes precedence over the Primary position.

P 2018-07-28 - 2018-09-03 
A 2018-09-04 - 2018-09-05             
P 2018-09-06 - 2018-11-11                  
A 2018-11-12 - 2018-11-13                                           
P 2018-11-14 - 2018-11-18                                                
A 2018-11-19 - 2018-11-20                                                  

P 2018-11-21 - 9999-99-99                                                  
     


Below is a representation of the data directly from the database, to give an idea of what it looks like overall:

```
Employee Position Start Date End Date Position Type
1 30 2016-12-10 2016-12-11 P
1 30 2016-12-12 2017-12-08 P
1 20 2017-01-31 2017-02-02 A
1 20 2017-03-29 2017-03-31 A
1 20 2017-04-19 2017-04-21 A
1 20 2017-06-22 2017-06-23 A
1 20 2017-06-29 2017-06

Solution

First, I think the link you had supplied doesn't fit for your question because it is about combining overlapped dates, and you need to generate (GAPs) new dates.

I would divide your data in two groups, those corresponding to Primary(s) and those that belong to Acting(s).

SELECT   t1.*,
         ROW_NUMBER() OVER (PARTITION BY Employee ORDER BY Employee, StartDate) P1
FROM     tbl t1
WHERE    t1.PosType = 'P'
ORDER BY t1.Employee, t1.StartDate
;
GO


Employee | Position | StartDate | EndDate | PosType | P1
-------: | -------: | :----------| :----------| :------ | :-
1 | 30 | 10/12/2016 | 11/12/2016 | P | 1
1 | 30 | 12/12/2016 | 08/12/2017 | P | 2
1 | 30 | 09/12/2017 | 26/07/2018 | P | 3
1 | 30 | 27/07/2018 | 27/07/2018 | P | 4
1 | 30 | 28/07/2018 | 31/12/9999 | P | 5

Next Acting rows are subsumed into the second Principal row: 12/12/2016 | 08/12/2017

SELECT   t2.*,
         ROW_NUMBER() OVER (ORDER BY t2.StartDate) P2
FROM     tbl t2
WHERE    t2.PosType = 'A'
         AND t2.StartDate >= '20161212'
         AND t2.StartDate <= '20171208'
ORDER BY t2.Employee, t2.StartDate
;
GO


Employee | Position | StartDate | EndDate | PosType | P2
-------: | -------: | :----------| :----------| :------ | :--
1 | 20 | 31/01/2017 | 02/02/2017 | A | 1
1 | 20 | 29/03/2017 | 31/03/2017 | A | 2
1 | 20 | 19/04/2017 | 21/04/2017 | A | 3
1 | 20 | 22/06/2017 | 23/06/2017 | A | 4
1 | 20 | 29/06/2017 | 30/06/2017 | A | 5
1 | 20 | 03/10/2017 | 06/10/2017 | A | 6

Now let me generate a new dataset, for each Primary I'll add the corresponding Acting(s) using an OUTER JOIN, this outer join returns all Primary(s) even if they haven't subsumed Acting(s).

SELECT   t1.*, COALESCE(t3.P2, 0) P2, P, t3.PrevED, t3.SD, t3.ED, NextSD, t3.PT
INTO #mydata
FROM     (SELECT *,
                 ROW_NUMBER() OVER (PARTITION BY Employee ORDER BY Employee, StartDate) as P1
          FROM   tbl 
          WHERE  PosType = 'P'
          ) t1
OUTER APPLY (SELECT   ROW_NUMBER() OVER (ORDER BY t2.StartDate) as P2,
                      LAG(t2.EndDate) OVER (ORDER BY t2.StartDate) as PrevED,
                      LEAD(t2.StartDate) OVER (ORDER BY t2.StartDate) as NextSD,
                      t2.Position as P, t2.StartDate as SD, t2.EndDate as ED, t2.PosType as PT
             FROM     tbl t2
             WHERE    t2.Employee = t1.Employee
                      AND t2.PosType = 'A'
                      AND t2.StartDate >= t1.StartDate
                      AND t2.StartDate <= t1.EndDate
            ) t3
WHERE    t1.PosType = 'P'
ORDER BY t1.Employee, t1.StartDate, t3.SD
;

CREATE CLUSTERED INDEX PK_Mydata ON #mydata(Employee, P1, P2);

SELECT * FROM #mydata ORDER BY Employee, P1, P2;
GO



NOTE: I've used a temporary table plus and index, but you could try by using a CTE. You should check performance of both options.

This is the new scenario:

Employee | Position | StartDate | EndDate | PosType | P1 | P2 | P | PrevED | SD | ED | NextSD | PT
-------: | -------: | :----------| :----------| :------ | :- | :- | ---: | :----------| :----------| :----------| :----------| :---
1 | 30 | 10/12/2016 | 11/12/2016 | P | 1 | 0 | null | null | null | null | null | null
1 | 30 | 12/12/2016 | 08/12/2017 | P | 2 | 1 | 20 | null | 31/01/2017 | 02/02/2017 | 29/03/2017 | A
1 | 30 | 12/12/2016 | 08/12/2017 | P | 2 | 2 | 20 | 02/02/2017 | 29/03/2017 | 31/03/2017 | 19/04/2017 | A
1 | 30 | 12/12/2016 | 08/12/2017 | P | 2 | 3 | 20 | 31/03/2017 | 19/04/2017 | 21/04/2017 | 22/06/2017 | A
1 | 30 | 12/12/2016 | 08/12/2017 | P | 2 | 4 | 20 | 21/04/2017 | 22/06/2017 | 23/06/2017 | 29/06/2017 | A
1 | 30 | 12/12/2016 | 08/12/2017 | P | 2 | 5 | 20 | 23/06/2017 | 29/06/2017 | 30/06/2017 | 03/10/2017 | A
1 | 30 | 12/12/2016 | 08/12/2017 | P | 2 | 6 | 20 | 30/06/2017 | 03/10/2017 | 06/10/2017 | null | A
1 | 30 | 09/12/2017 | 26/07/2018 | P | 3 | 1 | 20 | null | 12/02/2018 | 02/03/2018 | 19/07/2018 | A
1 | 30 | 09/12/2017 | 26/07/2018 | P | 3 | 2 | 20 | 02/03/2018 | 19/07/2018 | 20/07/2018 | null | A
1 | 30 | 27/07/2018 | 27/07/2018 | P | 4 | 0 | null | null | null | null | null | null
1 | 30 | 28/07/2018 | 31/12/9999 | P | 5 | 1 | 20 | null | 04/09/2018 | 05/09/2018 | 12/11/2018 | A
1 | 30 | 28/07/2018 | 31/12/9999 | P | 5 | 2 | 20 | 05/09/2018 | 12/11/2018 | 13/11/2018 | 19/11/2018 | A
1 | 30 | 28/07/2018

Code Snippets

SELECT   t1.*,
         ROW_NUMBER() OVER (PARTITION BY Employee ORDER BY Employee, StartDate) P1
FROM     tbl t1
WHERE    t1.PosType = 'P'
ORDER BY t1.Employee, t1.StartDate
;
GO
SELECT   t2.*,
         ROW_NUMBER() OVER (ORDER BY t2.StartDate) P2
FROM     tbl t2
WHERE    t2.PosType = 'A'
         AND t2.StartDate >= '20161212'
         AND t2.StartDate <= '20171208'
ORDER BY t2.Employee, t2.StartDate
;
GO
SELECT   t1.*, COALESCE(t3.P2, 0) P2, P, t3.PrevED, t3.SD, t3.ED, NextSD, t3.PT
INTO #mydata
FROM     (SELECT *,
                 ROW_NUMBER() OVER (PARTITION BY Employee ORDER BY Employee, StartDate) as P1
          FROM   tbl 
          WHERE  PosType = 'P'
          ) t1
OUTER APPLY (SELECT   ROW_NUMBER() OVER (ORDER BY t2.StartDate) as P2,
                      LAG(t2.EndDate) OVER (ORDER BY t2.StartDate) as PrevED,
                      LEAD(t2.StartDate) OVER (ORDER BY t2.StartDate) as NextSD,
                      t2.Position as P, t2.StartDate as SD, t2.EndDate as ED, t2.PosType as PT
             FROM     tbl t2
             WHERE    t2.Employee = t1.Employee
                      AND t2.PosType = 'A'
                      AND t2.StartDate >= t1.StartDate
                      AND t2.StartDate <= t1.EndDate
            ) t3
WHERE    t1.PosType = 'P'
ORDER BY t1.Employee, t1.StartDate, t3.SD
;

CREATE CLUSTERED INDEX PK_Mydata ON #mydata(Employee, P1, P2);

SELECT * FROM #mydata ORDER BY Employee, P1, P2;
GO
SELECT *
FROM
    (
    -- Primary(s) without Acting(s)
    SELECT Employee, Position, StartDate, EndDate, PosType
    FROM   #mydata
    WHERE  Employee = 1
           AND P2 = 0
    --
    -- plus all Acting(s)
    --
    UNION
    SELECT Employee, P, SD, ED, PT
    FROM   #mydata
    WHERE  Employee = 1
           AND P2 <> 0
    --
    -- plus all (left) Gaps
    --
    UNION
    SELECT Employee, 
           Position, 
           IIF(PrevED IS NULL, StartDate, DATEADD(day, 1, PrevED)), 
           DATEADD(day, -1, SD),
           PosType
    FROM   #mydata
    WHERE  Employee = 1
           AND P2 <> 0
           AND DATEDIFF(day, IIF(PrevED IS NULL, StartDate, DATEADD(day, 1, PrevED)), SD) > 0
    UNION
    --
    -- plus all (last right) Gap
    --
    SELECT Employee, 
           Position, 
           DATEADD(day, 1, ED),
           EndDate,
           PosType
    FROM   #mydata
    WHERE  Employee = 1
           AND P2 <> 0
           AND NextSD IS NULL
           AND DATEDIFF(day, ED, EndDate) > 0
    ) T
ORDER BY 1, 3;
GO

Context

StackExchange Database Administrators Q#224249, answer score: 3

Revisions (0)

No revisions yet.