patternsqlMinor
Improve The Performance Of Multiple Date Range Predicates
Viewed 0 times
predicatestherangedateimproveperformancemultiple
Problem
let's say
You have a stored procedure that accepts arrays of datetimes, which are loaded into a temp table, and used to filter on a datetime column in a table.
What's the most efficient way to write a query to perform the filtering?
setup
query
The best I've been able to get is using
Which results in a rather sad-looking execution plan:
Nested Loops is the only join operator available, since we don't have an equality predicate.
What I’m looking for is alternative syntax that produces a different type of join.
Thanks!
You have a stored procedure that accepts arrays of datetimes, which are loaded into a temp table, and used to filter on a datetime column in a table.
- There can be any number of values inserted as begin and end dates.
- Date ranges may overlap sometimes, but it's not a condition that I'd count on regularly.
- It's also possible for dates with times to be supplied.
What's the most efficient way to write a query to perform the filtering?
setup
USE StackOverflow2013;
CREATE TABLE
#d
(
dfrom datetime,
dto datetime,
PRIMARY KEY (dfrom, dto)
)
INSERT
#d
(
dfrom,
dto
)
SELECT
dfrom = '2013-11-20',
dto = '2013-12-05'
UNION ALL
SELECT
dfrom = '2013-11-27',
dto = '2013-12-12';
CREATE INDEX
p
ON dbo.Posts
(CreationDate)
WITH
(SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);query
The best I've been able to get is using
EXISTS like so:SELECT
c = COUNT_BIG(*)
FROM dbo.Posts AS p
WHERE EXISTS
(
SELECT
1/0
FROM #d AS d
WHERE p.CreationDate BETWEEN d.dfrom
AND d.dto
);Which results in a rather sad-looking execution plan:
Nested Loops is the only join operator available, since we don't have an equality predicate.
What I’m looking for is alternative syntax that produces a different type of join.
Thanks!
Solution
Join
You may find a join gives adequate performance, despite still using nested loops. This is a variation on David Browne's updated answer:
That runs in around 150ms for me.
Remove overlaps and join
If you have significant overlapping rows, it may be worth reducing them to distinct, non-overlapping ranges, as outlined in Martin Smith's answer. My implementation of a similar idea is:
That runs in about 40ms for me.
Dynamic seek with no join
Another approach is to generate literal ranges dynamically:
With the sample data, that produces:
Which SQL Server simplifies to a single range seek:
That runs in about 35ms for me.
In general, the optimizer will simplify the ranges as much as possible (much like a Merge Interval would do). There doesn't appear to be a limit on the number of separate range seeks in a single Index Seek operator. I got bored after 1440 seeks.
Indexed view
Rather than count rows over and over again, it may be helpful to store and maintain bucketized counts in an indexed view. The following implementation uses hour granularity:
The view takes around 2s to create on my laptop. At hour granularity, the indexed view holds 47,469 rows for my copy of the StackOverflow2013 database's Posts table with 17,142,169 rows.
The majority of the work can be done from the view. Only part-hour periods at the start and end of any range not covered by any other range needs to be processed separately. For example:
```
-- Helper inline function
CREATE OR ALTER FUNCTION dbo.RoundToHour (@d datetime)
RETURNS table
AS
RETURN
SELECT
HourBucket =
DATEADD(HOUR,
DATEDIFF(HOUR,
CONVERT(datetime, '2000010
You may find a join gives adequate performance, despite still using nested loops. This is a variation on David Browne's updated answer:
SELECT
NumRows = COUNT_BIG(DISTINCT P.Id)
FROM #d AS D
JOIN dbo.Posts AS P
ON P.CreationDate BETWEEN D.dfrom AND D.dto;That runs in around 150ms for me.
Remove overlaps and join
If you have significant overlapping rows, it may be worth reducing them to distinct, non-overlapping ranges, as outlined in Martin Smith's answer. My implementation of a similar idea is:
WITH
Intervals AS
(
SELECT
IntervalStart =
ISNULL
(
LAG(Q1.NextStart) OVER (ORDER BY Q1.ThisFrom),
Q1.FirstFrom
),
IntervalEnd =
IIF
(
Q1.NextStart IS NOT NULL,
Q1.ThisEnd,
Q1.LastEnd
)
FROM
(
SELECT
ThisFrom = D.dfrom,
ThisEnd = D.dto,
-- Remember the start of the next row because that row
-- may get filtered out by the outer WHERE clase if it
-- is not also the end of an interval.
NextStart = LEAD(D.dfrom) OVER (
ORDER BY D.dfrom, D.dto),
-- Start point of the first interval
FirstFrom = MIN(D.dfrom) OVER (
ORDER BY D.dfrom, D.dto
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
-- End point of the last interval
LastEnd = MAX(D.dto) OVER (
ORDER BY D.dfrom, D.dto
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM #d AS D
WHERE
-- Valid intervals only
D.dto >= D.dfrom
) AS Q1
WHERE
-- Interval ends only
Q1.NextStart > Q1.ThisEnd
OR Q1.NextStart IS NULL
)
SELECT COUNT_BIG(*)
FROM dbo.Posts AS P
JOIN Intervals AS N
ON P.CreationDate BETWEEN N.IntervalStart AND N.IntervalEnd;That runs in about 40ms for me.
Dynamic seek with no join
Another approach is to generate literal ranges dynamically:
DECLARE @SQL nvarchar(max) =
N'
SELECT Rows = COUNT_BIG(*)
FROM dbo.Posts AS P
WHERE 0 = 1
';
SELECT @SQL +=
STRING_AGG
(
CONCAT
(
CONVERT(nvarchar(max), SPACE(0)),
N'OR P.CreationDate BETWEEN ',
N'CONVERT(datetime, ',
NCHAR(39),
CONVERT(nchar(23), D.dfrom, 121),
NCHAR(39),
N', 121)',
N' AND CONVERT(datetime, ',
NCHAR(39),
CONVERT(nchar(23), D.dto, 121),
NCHAR(39),
N', 121)',
NCHAR(13), NCHAR(10)
),
N''
)
FROM #d AS D;
SET @SQL += N'OPTION (RECOMPILE);' -- Plan reuse is unlikely
EXECUTE (@SQL);With the sample data, that produces:
SELECT Rows = COUNT_BIG(*)
FROM dbo.Posts AS P
WHERE 0 = 1
OR P.CreationDate BETWEEN CONVERT(datetime, '2013-11-20 00:00:00.000', 121)
AND CONVERT(datetime, '2013-12-05 00:00:00.000', 121)
OR P.CreationDate BETWEEN CONVERT(datetime, '2013-11-27 00:00:00.000', 121)
AND CONVERT(datetime, '2013-12-12 00:00:00.000', 121)
OPTION (RECOMPILE);Which SQL Server simplifies to a single range seek:
That runs in about 35ms for me.
In general, the optimizer will simplify the ranges as much as possible (much like a Merge Interval would do). There doesn't appear to be a limit on the number of separate range seeks in a single Index Seek operator. I got bored after 1440 seeks.
Indexed view
Rather than count rows over and over again, it may be helpful to store and maintain bucketized counts in an indexed view. The following implementation uses hour granularity:
CREATE OR ALTER VIEW dbo.PostsTimeBucket
WITH SCHEMABINDING
AS
SELECT
HourBucket =
DATEADD(HOUR,
DATEDIFF(HOUR,
CONVERT(datetime, '20000101', 112),
P.CreationDate),
CONVERT(datetime, '20000101', 112)),
NumRows = COUNT_BIG(*)
FROM dbo.Posts AS P
GROUP BY
DATEADD(HOUR,
DATEDIFF(HOUR,
CONVERT(datetime, '20000101', 112),
P.CreationDate),
CONVERT(datetime, '20000101', 112));
GO
CREATE UNIQUE CLUSTERED INDEX [CUQ dbo.PostsTimeBucket HourBucket]
ON dbo.PostsTimeBucket (HourBucket);The view takes around 2s to create on my laptop. At hour granularity, the indexed view holds 47,469 rows for my copy of the StackOverflow2013 database's Posts table with 17,142,169 rows.
The majority of the work can be done from the view. Only part-hour periods at the start and end of any range not covered by any other range needs to be processed separately. For example:
```
-- Helper inline function
CREATE OR ALTER FUNCTION dbo.RoundToHour (@d datetime)
RETURNS table
AS
RETURN
SELECT
HourBucket =
DATEADD(HOUR,
DATEDIFF(HOUR,
CONVERT(datetime, '2000010
Code Snippets
SELECT
NumRows = COUNT_BIG(DISTINCT P.Id)
FROM #d AS D
JOIN dbo.Posts AS P
ON P.CreationDate BETWEEN D.dfrom AND D.dto;WITH
Intervals AS
(
SELECT
IntervalStart =
ISNULL
(
LAG(Q1.NextStart) OVER (ORDER BY Q1.ThisFrom),
Q1.FirstFrom
),
IntervalEnd =
IIF
(
Q1.NextStart IS NOT NULL,
Q1.ThisEnd,
Q1.LastEnd
)
FROM
(
SELECT
ThisFrom = D.dfrom,
ThisEnd = D.dto,
-- Remember the start of the next row because that row
-- may get filtered out by the outer WHERE clase if it
-- is not also the end of an interval.
NextStart = LEAD(D.dfrom) OVER (
ORDER BY D.dfrom, D.dto),
-- Start point of the first interval
FirstFrom = MIN(D.dfrom) OVER (
ORDER BY D.dfrom, D.dto
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
-- End point of the last interval
LastEnd = MAX(D.dto) OVER (
ORDER BY D.dfrom, D.dto
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM #d AS D
WHERE
-- Valid intervals only
D.dto >= D.dfrom
) AS Q1
WHERE
-- Interval ends only
Q1.NextStart > Q1.ThisEnd
OR Q1.NextStart IS NULL
)
SELECT COUNT_BIG(*)
FROM dbo.Posts AS P
JOIN Intervals AS N
ON P.CreationDate BETWEEN N.IntervalStart AND N.IntervalEnd;DECLARE @SQL nvarchar(max) =
N'
SELECT Rows = COUNT_BIG(*)
FROM dbo.Posts AS P
WHERE 0 = 1
';
SELECT @SQL +=
STRING_AGG
(
CONCAT
(
CONVERT(nvarchar(max), SPACE(0)),
N'OR P.CreationDate BETWEEN ',
N'CONVERT(datetime, ',
NCHAR(39),
CONVERT(nchar(23), D.dfrom, 121),
NCHAR(39),
N', 121)',
N' AND CONVERT(datetime, ',
NCHAR(39),
CONVERT(nchar(23), D.dto, 121),
NCHAR(39),
N', 121)',
NCHAR(13), NCHAR(10)
),
N''
)
FROM #d AS D;
SET @SQL += N'OPTION (RECOMPILE);' -- Plan reuse is unlikely
EXECUTE (@SQL);SELECT Rows = COUNT_BIG(*)
FROM dbo.Posts AS P
WHERE 0 = 1
OR P.CreationDate BETWEEN CONVERT(datetime, '2013-11-20 00:00:00.000', 121)
AND CONVERT(datetime, '2013-12-05 00:00:00.000', 121)
OR P.CreationDate BETWEEN CONVERT(datetime, '2013-11-27 00:00:00.000', 121)
AND CONVERT(datetime, '2013-12-12 00:00:00.000', 121)
OPTION (RECOMPILE);CREATE OR ALTER VIEW dbo.PostsTimeBucket
WITH SCHEMABINDING
AS
SELECT
HourBucket =
DATEADD(HOUR,
DATEDIFF(HOUR,
CONVERT(datetime, '20000101', 112),
P.CreationDate),
CONVERT(datetime, '20000101', 112)),
NumRows = COUNT_BIG(*)
FROM dbo.Posts AS P
GROUP BY
DATEADD(HOUR,
DATEDIFF(HOUR,
CONVERT(datetime, '20000101', 112),
P.CreationDate),
CONVERT(datetime, '20000101', 112));
GO
CREATE UNIQUE CLUSTERED INDEX [CUQ dbo.PostsTimeBucket HourBucket]
ON dbo.PostsTimeBucket (HourBucket);Context
StackExchange Database Administrators Q#329139, answer score: 9
Revisions (0)
No revisions yet.