patternsqlMinor
Computing the set difference of tables of intervals
Viewed 0 times
tablesthedifferenceintervalssetcomputing
Problem
I often run into the following problem. I have two tables of intervals. They are bounded by dates (with no time component). Within each table the intervals do not overlap.
StartTs
EndTs
2015-01-03
2015-03-02
2015-03-05
2015-04-01
StartTs
EndTs
2015-01-07
2015-02-27
2015-03-01
2015-03-13
2016-01-01
2016-01-02
And I want to find the set difference of the the two tables, i.e., the intervals representing the time in the first table not in the second table.
Desired output for the dummy example above:
StartTs
EndTs
2015-01-03
2015-01-06
2015-02-28
2015-02-28
2015-03-14
2015-04-01
i.e. if the first table's dates are marked in yellow below and the ranges from the second table surrounded with a box I'd be looking for the contiguous ranges of unboxed yellow dates.
I'm currently treating intervals as inclusive on both ends, and using DateTime for my timestamps. My current approach is to take the complement of the second table via a triple self-join (yuck) and then intersecting the result with the first table via a join. Not fun.
Is there a better approach?
StartTs
EndTs
2015-01-03
2015-03-02
2015-03-05
2015-04-01
StartTs
EndTs
2015-01-07
2015-02-27
2015-03-01
2015-03-13
2016-01-01
2016-01-02
And I want to find the set difference of the the two tables, i.e., the intervals representing the time in the first table not in the second table.
Desired output for the dummy example above:
StartTs
EndTs
2015-01-03
2015-01-06
2015-02-28
2015-02-28
2015-03-14
2015-04-01
i.e. if the first table's dates are marked in yellow below and the ranges from the second table surrounded with a box I'd be looking for the contiguous ranges of unboxed yellow dates.
I'm currently treating intervals as inclusive on both ends, and using DateTime for my timestamps. My current approach is to take the complement of the second table via a triple self-join (yuck) and then intersecting the result with the first table via a join. Not fun.
Is there a better approach?
Solution
Given the various simplifying assumptions (Dates only and no overlapping intervals within a table) I might be minded to keep things simple.
First create an auxiliary numbers table (zero based)
Then the code below expands out the ranges to their component dates, uses
First create an auxiliary numbers table (zero based)
CREATE TABLE dbo.SmallNumbers
(
Number SMALLINT PRIMARY KEY
)
INSERT INTO dbo.SmallNumbers
SELECT TOP 32768 ROW_NUMBER() OVER (ORDER BY @@SPID)-1 AS Number
FROM sys.all_columns c1, sys.all_columns c2Then the code below expands out the ranges to their component dates, uses
EXCEPT to find the difference and a gaps and islands technique to collapse the ranges back down. As each date will only be represented once we are likely only talking about expanding out to a few thousand dates per decade covered.WITH UnmatchedDates(Date) AS
(
SELECT DATEADD(DAY,N.Number, StartTs)
FROM Table1
JOIN dbo.SmallNumbers N ON N.Number <= DATEDIFF(DAY, StartTs, EndTs)
EXCEPT
SELECT DATEADD(DAY,N.Number, StartTs)
FROM Table2
JOIN dbo.SmallNumbers N ON N.Number <= DATEDIFF(DAY, StartTs, EndTs)
),
UnmatchedDatesWithGrp(Date, Grp) AS
(
SELECT Date,
DATEDIFF(DAY, 0, Date) - ROW_NUMBER() OVER (ORDER BY Date)
FROM UnmatchedDates
)
SELECT StartTs = MIN(Date),
EndTs = MAX(Date)
FROM UnmatchedDatesWithGrp
GROUP BY GrpCode Snippets
CREATE TABLE dbo.SmallNumbers
(
Number SMALLINT PRIMARY KEY
)
INSERT INTO dbo.SmallNumbers
SELECT TOP 32768 ROW_NUMBER() OVER (ORDER BY @@SPID)-1 AS Number
FROM sys.all_columns c1, sys.all_columns c2WITH UnmatchedDates(Date) AS
(
SELECT DATEADD(DAY,N.Number, StartTs)
FROM Table1
JOIN dbo.SmallNumbers N ON N.Number <= DATEDIFF(DAY, StartTs, EndTs)
EXCEPT
SELECT DATEADD(DAY,N.Number, StartTs)
FROM Table2
JOIN dbo.SmallNumbers N ON N.Number <= DATEDIFF(DAY, StartTs, EndTs)
),
UnmatchedDatesWithGrp(Date, Grp) AS
(
SELECT Date,
DATEDIFF(DAY, 0, Date) - ROW_NUMBER() OVER (ORDER BY Date)
FROM UnmatchedDates
)
SELECT StartTs = MIN(Date),
EndTs = MAX(Date)
FROM UnmatchedDatesWithGrp
GROUP BY GrpContext
StackExchange Database Administrators Q#305832, answer score: 5
Revisions (0)
No revisions yet.