HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Computing the set difference of tables of intervals

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
tablesthedifferenceintervalssetcomputing

Problem

I often run into the following problem. I have two tables of intervals. They are bounded by dates (with no time component). Within each table the intervals do not overlap.

StartTs
EndTs

2015-01-03
2015-03-02

2015-03-05
2015-04-01

StartTs
EndTs

2015-01-07
2015-02-27

2015-03-01
2015-03-13

2016-01-01
2016-01-02

And I want to find the set difference of the the two tables, i.e., the intervals representing the time in the first table not in the second table.

Desired output for the dummy example above:

StartTs
EndTs

2015-01-03
2015-01-06

2015-02-28
2015-02-28

2015-03-14
2015-04-01

i.e. if the first table's dates are marked in yellow below and the ranges from the second table surrounded with a box I'd be looking for the contiguous ranges of unboxed yellow dates.

I'm currently treating intervals as inclusive on both ends, and using DateTime for my timestamps. My current approach is to take the complement of the second table via a triple self-join (yuck) and then intersecting the result with the first table via a join. Not fun.

Is there a better approach?

Solution

Given the various simplifying assumptions (Dates only and no overlapping intervals within a table) I might be minded to keep things simple.

First create an auxiliary numbers table (zero based)

CREATE TABLE dbo.SmallNumbers
(
Number SMALLINT PRIMARY KEY
)

INSERT INTO dbo.SmallNumbers
SELECT TOP 32768 ROW_NUMBER() OVER (ORDER BY @@SPID)-1 AS Number
FROM sys.all_columns c1, sys.all_columns c2


Then the code below expands out the ranges to their component dates, uses EXCEPT to find the difference and a gaps and islands technique to collapse the ranges back down. As each date will only be represented once we are likely only talking about expanding out to a few thousand dates per decade covered.

WITH UnmatchedDates(Date) AS
(
SELECT DATEADD(DAY,N.Number, StartTs)
FROM Table1
JOIN dbo.SmallNumbers N ON N.Number <= DATEDIFF(DAY, StartTs, EndTs)
EXCEPT 
SELECT DATEADD(DAY,N.Number, StartTs)
FROM Table2
JOIN dbo.SmallNumbers N ON N.Number <= DATEDIFF(DAY, StartTs, EndTs)
),
UnmatchedDatesWithGrp(Date, Grp) AS
(
SELECT Date, 
       DATEDIFF(DAY, 0, Date) - ROW_NUMBER() OVER (ORDER BY Date)
FROM UnmatchedDates
)
SELECT StartTs = MIN(Date), 
         EndTs = MAX(Date)
FROM UnmatchedDatesWithGrp
GROUP BY Grp

Code Snippets

CREATE TABLE dbo.SmallNumbers
(
Number SMALLINT PRIMARY KEY
)

INSERT INTO dbo.SmallNumbers
SELECT TOP 32768 ROW_NUMBER() OVER (ORDER BY @@SPID)-1 AS Number
FROM sys.all_columns c1, sys.all_columns c2
WITH UnmatchedDates(Date) AS
(
SELECT DATEADD(DAY,N.Number, StartTs)
FROM Table1
JOIN dbo.SmallNumbers N ON N.Number <= DATEDIFF(DAY, StartTs, EndTs)
EXCEPT 
SELECT DATEADD(DAY,N.Number, StartTs)
FROM Table2
JOIN dbo.SmallNumbers N ON N.Number <= DATEDIFF(DAY, StartTs, EndTs)
),
UnmatchedDatesWithGrp(Date, Grp) AS
(
SELECT Date, 
       DATEDIFF(DAY, 0, Date) - ROW_NUMBER() OVER (ORDER BY Date)
FROM UnmatchedDates
)
SELECT StartTs = MIN(Date), 
         EndTs = MAX(Date)
FROM UnmatchedDatesWithGrp
GROUP BY Grp

Context

StackExchange Database Administrators Q#305832, answer score: 5

Revisions (0)

No revisions yet.