patternsqlMinor
Select all overlapping ranges from starting range
Viewed 0 times
startingallrangerangesoverlappingselectfrom
Problem
My question is similar to this one with (I think) significant enough difference. I have a base range and I want to find all other ranges in a table that conflict with it and each other. or rather the items that form the ranges, but it doesn't really make a difference.
The line with a asterisk is the starting range. The ranges 1,2 and 3 are the ones that are supposed to expand it. The result range should be the X.
I have written this:
But after some fiddling realized this can't really work, since the 2nd SELECT doesn't know what
We have previously calculated this in client in .NET via set of recursive functions, but at O(N^2) it was stupidly slow. The intent here is to move the calculations to the server as well as optimize the calculation.
Is what I'm trying to do here at all viable?
SQL Fiddle. I'm having some trouble getting the query to run as it did on my machine (where I used less simple data structure), but at least I added example data. I will keep trying to get it to produce the same result I got on my machine.
With input range from
The line with a asterisk is the starting range. The ranges 1,2 and 3 are the ones that are supposed to expand it. The result range should be the X.
1 |
3 | ===1==== ====
5 | ==2== ====*==== ====
6 | ====3==== =====
--+-------------------------------------
| ||I have written this:
WITH cte
AS (
SELECT DATA1.ID, DATA1.STARTDATE, DATA1.ENDDATE
FROM DATA1
WHERE
DATA1.ID = @PARID AND
DATA1.STARTDATE > @STARTDATE AND
DATA1.ENDDATE DATA1.ENDDATE)
)
SELECT *
FROM cteBut after some fiddling realized this can't really work, since the 2nd SELECT doesn't know what
STARTDATE and ENDDATE it should use from the cte. And I can't use subqueries in recursive SQL.We have previously calculated this in client in .NET via set of recursive functions, but at O(N^2) it was stupidly slow. The intent here is to move the calculations to the server as well as optimize the calculation.
Is what I'm trying to do here at all viable?
SQL Fiddle. I'm having some trouble getting the query to run as it did on my machine (where I used less simple data structure), but at least I added example data. I will keep trying to get it to produce the same result I got on my machine.
With input range from
2018-08-24 12:00:00 to 2018-08-31 12:00:00, the correct output should be IDs 2, 3, 4, 6.Solution
A straightforward recursive solution splits the task in two parts:
Details follow:
Example code:
Example code:
With suitable indexing, both queries use an execution plan like:
The results given the sample data provided are:
╔════╦═════════════════════════╗
║ ID ║ STARTDATE ║
╠════╬═════════════════════════╣
║ 3 ║ 2018-08-24 12:00:00.000 ║
║ 6 ║ 2018-08-23 12:00:00.000 ║
║ 2 ║ 2018-08-22 12:00:00.000 ║
╚════╩═════════════════════════╝
╔════╦═════════════════════════╗
║ ID ║ ENDDATE ║
╠════╬═════════════════════════╣
║ 3 ║ 2018-08-29 12:00:00.000 ║
║ 6 ║ 2018-08-30 12:00:00.000 ║
║ 4 ║ 2018-09-02 12:00:00.000 ║
╚════╩═════════════════════════╝
So IDs 2, 3, 4, and 6 were used. The final range is
Note: The code above includes a tie-break on dates (using the
The indexes used were:
Demo: db<>fiddle
Note that truly optimal indexing for these types of queries can be difficult. Even so, the approach above often performs well for many practical tasks in this area.
Until proper support for interval queries and predicates is added to SQL Server, higher performance solutions require significant extra work. See:
- Find the earliest start date by following the chain of overlapping ranges
- Find the latest end date by following the chain of overlapping ranges
Details follow:
- Follow the chain of overlaps leading to the earliest start date
- Find the starting range
- Find the overlapping range with the latest earlier start date than current
- Go to step 2
Example code:
DECLARE @STARTDATE DATETIME = '2018-08-24 12:00:00';
DECLARE @ENDDATE DATETIME = '2018-08-31 12:00:00';
WITH MinStart AS
(
-- Starting range
SELECT TOP (1)
D.ID,
D.STARTDATE
FROM dbo.DATA1 AS D
WHERE
D.STARTDATE >= @STARTDATE
AND D.ENDDATE = R.STARTDATE
AND D.STARTDATE < R.STARTDATE
) AS P1
WHERE
-- Highest start date only
P1.RN = 1
)
SELECT
MS.ID,
MS.STARTDATE
FROM MinStart AS MS
OPTION (MAXRECURSION 0);- Follow the chain of overlaps leading to the latest end date
- Find the starting range
- Find the overlapping range with the earliest later end date than current
- Go to step 2
Example code:
DECLARE @STARTDATE DATETIME = '2018-08-24 12:00:00';
DECLARE @ENDDATE DATETIME = '2018-08-31 12:00:00';
WITH MaxEnd AS
(
-- Starting range
SELECT TOP (1)
D.ID,
D.ENDDATE
FROM dbo.DATA1 AS D
WHERE
D.STARTDATE >= @STARTDATE
AND D.ENDDATE R.ENDDATE
AND D.STARTDATE < R.ENDDATE
) AS P1
WHERE
-- Lowest end date only
P1.RN = 1
)
SELECT
ME.ID,
ME.ENDDATE
FROM MaxEnd AS ME
OPTION (MAXRECURSION 0);With suitable indexing, both queries use an execution plan like:
The results given the sample data provided are:
╔════╦═════════════════════════╗
║ ID ║ STARTDATE ║
╠════╬═════════════════════════╣
║ 3 ║ 2018-08-24 12:00:00.000 ║
║ 6 ║ 2018-08-23 12:00:00.000 ║
║ 2 ║ 2018-08-22 12:00:00.000 ║
╚════╩═════════════════════════╝
╔════╦═════════════════════════╗
║ ID ║ ENDDATE ║
╠════╬═════════════════════════╣
║ 3 ║ 2018-08-29 12:00:00.000 ║
║ 6 ║ 2018-08-30 12:00:00.000 ║
║ 4 ║ 2018-09-02 12:00:00.000 ║
╚════╩═════════════════════════╝
So IDs 2, 3, 4, and 6 were used. The final range is
2018-08-22 12:00:00.000 (lowest start date found) to 2018-09-02 12:00:00.000 (highest end date found).Note: The code above includes a tie-break on dates (using the
ID column) due to duplicates in the sample data. This may, or may not, be required to solve the real problem.The indexes used were:
CREATE TABLE dbo.DATA1
(
ID integer PRIMARY KEY,
STARTDATE datetime NOT NULL,
ENDDATE datetime NOT NULL,
);
CREATE UNIQUE INDEX i1 ON dbo.DATA1 (STARTDATE DESC, ID ASC) INCLUDE (ENDDATE);
CREATE UNIQUE INDEX i2 ON dbo.DATA1 (ENDDATE ASC, ID DESC) INCLUDE (STARTDATE);Demo: db<>fiddle
Note that truly optimal indexing for these types of queries can be difficult. Even so, the approach above often performs well for many practical tasks in this area.
Until proper support for interval queries and predicates is added to SQL Server, higher performance solutions require significant extra work. See:
- Interval Queries in SQL Server Part 4 by Dejan Sarka
- Interval Queries in SQL Server by Itzik Ben-Gan
Code Snippets
DECLARE @STARTDATE DATETIME = '2018-08-24 12:00:00';
DECLARE @ENDDATE DATETIME = '2018-08-31 12:00:00';
WITH MinStart AS
(
-- Starting range
SELECT TOP (1)
D.ID,
D.STARTDATE
FROM dbo.DATA1 AS D
WHERE
D.STARTDATE >= @STARTDATE
AND D.ENDDATE <= @ENDDATE
ORDER BY
D.ID ASC
UNION ALL
SELECT
P1.ID,
P1.STARTDATE
FROM
(
-- Overlapping ranges with an earlier start date
-- Numbered starting with the highest start date
-- Tie-break dates using ID
SELECT
RN = ROW_NUMBER() OVER (
ORDER BY D.STARTDATE DESC, D.ID ASC),
D.ID,
D.STARTDATE
FROM MinStart AS R
JOIN dbo.DATA1 AS D
ON D.ENDDATE >= R.STARTDATE
AND D.STARTDATE < R.STARTDATE
) AS P1
WHERE
-- Highest start date only
P1.RN = 1
)
SELECT
MS.ID,
MS.STARTDATE
FROM MinStart AS MS
OPTION (MAXRECURSION 0);DECLARE @STARTDATE DATETIME = '2018-08-24 12:00:00';
DECLARE @ENDDATE DATETIME = '2018-08-31 12:00:00';
WITH MaxEnd AS
(
-- Starting range
SELECT TOP (1)
D.ID,
D.ENDDATE
FROM dbo.DATA1 AS D
WHERE
D.STARTDATE >= @STARTDATE
AND D.ENDDATE <= @ENDDATE
ORDER BY
D.ID ASC
UNION ALL
SELECT
P1.ID,
P1.ENDDATE
FROM
(
-- Overlapping ranges with a later end date
-- Numbered starting with the lowest end date
-- Tie-break dates using ID
SELECT
RN = ROW_NUMBER() OVER (
ORDER BY D.ENDDATE ASC, D.ID ASC),
D.ID,
D.ENDDATE
FROM MaxEnd AS R
JOIN dbo.DATA1 AS D
ON D.ENDDATE > R.ENDDATE
AND D.STARTDATE < R.ENDDATE
) AS P1
WHERE
-- Lowest end date only
P1.RN = 1
)
SELECT
ME.ID,
ME.ENDDATE
FROM MaxEnd AS ME
OPTION (MAXRECURSION 0);CREATE TABLE dbo.DATA1
(
ID integer PRIMARY KEY,
STARTDATE datetime NOT NULL,
ENDDATE datetime NOT NULL,
);
CREATE UNIQUE INDEX i1 ON dbo.DATA1 (STARTDATE DESC, ID ASC) INCLUDE (ENDDATE);
CREATE UNIQUE INDEX i2 ON dbo.DATA1 (ENDDATE ASC, ID DESC) INCLUDE (STARTDATE);Context
StackExchange Database Administrators Q#215967, answer score: 5
Revisions (0)
No revisions yet.