patternsqlMajor
Most Efficient Way to Retrieve Date Ranges
Viewed 0 times
efficientwaydaterangesretrievemost
Problem
What's the most efficient way to retrieve date ranges with a table structure like this?
Say you want a range for both
I know there are a few ways to probably go about this, but what is the most advised?
create table SomeDateTable
(
id int identity(1, 1) not null,
StartDate datetime not null,
EndDate datetime not null
)
goSay you want a range for both
StartDate and EndDate. So in other words, if StartDate falls in between @StartDateBegin and @StartDateEnd, and EndDate falls in between @EndDateBegin and @EndDateEnd, then do something.I know there are a few ways to probably go about this, but what is the most advised?
Solution
This is a hard problem to solve in general, but there are a couple of things we can do to help the optimizer choose a plan. This script creates a table with 10,000 rows with a known pseudo-random distribution of rows to illustrate:
The first question is how to index this table. One option is to provide two indexes on the
Naturally, the inequalities on both
This query uses variables, so in general the optimizer will guess at selectivity and distribution, resulting in a guessed cardinality estimate of 81 rows. In fact, the query produces 2076 rows, a discrepancy that might be important in a more complex example.
On SQL Server 2008 SP1 CU5 or later (or R2 RTM CU1) we can take advantage of the Parameter Embedding Optimization to get better estimates, simply by adding
The problem is where the rows qualified by the two range searches overlap. One of the simplifying assumptions made in the optimizer's costing and cardinality estimation component is that predicates are independent (so if both have a selectivity of 50%, the result of applying both is assumed to qualify 50% of 50% = 25% of the rows). Where this sort of correlation is a problem, we can often work around it with multi-column and/or filtered statistics. With two ranges with unknown start and end points, this becomes impractical. This is where we sometimes have to resort to rewriting the query to a form that happens to produce a better estimate:
This form happens to produce a runtime estimate of 2110 rows (versus 2076 actual). Unless you have TF 2301 on, in which case the more advanced modelling techniques see through the trick and produce exactly the same estimate as before: 468 rows.
One day SQL Server might gain native support for intervals. If that comes with good statistical support, developers might dread tuning query plans like this a little less.
CREATE TABLE dbo.SomeDateTable
(
Id INTEGER IDENTITY(1, 1) PRIMARY KEY NOT NULL,
StartDate DATETIME NOT NULL,
EndDate DATETIME NOT NULL
);
GO
SET STATISTICS XML OFF
SET NOCOUNT ON;
DECLARE
@i INTEGER = 1,
@s FLOAT = RAND(20120104),
@e FLOAT = RAND();
WHILE @i <= 10000
BEGIN
INSERT dbo.SomeDateTable
(
StartDate,
EndDate
)
VALUES
(
DATEADD(DAY, @s * 365, {d '2009-01-01'}),
DATEADD(DAY, @s * 365 + @e * 14, {d '2009-01-01'})
)
SELECT
@s = RAND(),
@e = RAND(),
@i += 1
ENDThe first question is how to index this table. One option is to provide two indexes on the
DATETIME columns, so the optimizer can at least choose whether to seek on StartDate or EndDate.CREATE INDEX nc1 ON dbo.SomeDateTable (StartDate, EndDate)
CREATE INDEX nc2 ON dbo.SomeDateTable (EndDate, StartDate)Naturally, the inequalities on both
StartDate and EndDate mean that only one column in each index can support a seek in the example query, but this is about the best we can do. We might consider making the second column in each index an INCLUDE rather than a key, but we might have other queries that can perform an equality seek on the leading column and an inequality seek on the second column. Also, we may get better statistics this way. Anyway...DECLARE
@StartDateBegin DATETIME = {d '2009-08-01'},
@StartDateEnd DATETIME = {d '2009-10-15'},
@EndDateBegin DATETIME = {d '2009-08-05'},
@EndDateEnd DATETIME = {d '2009-10-22'}
SELECT
COUNT_BIG(*)
FROM dbo.SomeDateTable AS sdt
WHERE
sdt.StartDate BETWEEN @StartDateBegin AND @StartDateEnd
AND sdt.EndDate BETWEEN @EndDateBegin AND @EndDateEndThis query uses variables, so in general the optimizer will guess at selectivity and distribution, resulting in a guessed cardinality estimate of 81 rows. In fact, the query produces 2076 rows, a discrepancy that might be important in a more complex example.
On SQL Server 2008 SP1 CU5 or later (or R2 RTM CU1) we can take advantage of the Parameter Embedding Optimization to get better estimates, simply by adding
OPTION (RECOMPILE) to the SELECT query above. This causes a compilation just before the batch executes, allowing SQL Server to 'see' the real parameter values and optimize for those. With this change, the estimate improves to 468 rows (though you do need to check the runtime plan to see this). This estimate is better than 81 rows, but still not all that close. The modelling extensions enabled by trace flag 2301 may help in some cases, but not with this query.The problem is where the rows qualified by the two range searches overlap. One of the simplifying assumptions made in the optimizer's costing and cardinality estimation component is that predicates are independent (so if both have a selectivity of 50%, the result of applying both is assumed to qualify 50% of 50% = 25% of the rows). Where this sort of correlation is a problem, we can often work around it with multi-column and/or filtered statistics. With two ranges with unknown start and end points, this becomes impractical. This is where we sometimes have to resort to rewriting the query to a form that happens to produce a better estimate:
SELECT COUNT(*) FROM
(
SELECT
sdt.Id
FROM dbo.SomeDateTable AS sdt
WHERE
sdt.StartDate BETWEEN @StartDateBegin AND @StartDateEnd
INTERSECT
SELECT
sdt.Id
FROM dbo.SomeDateTable AS sdt
WHERE
sdt.EndDate BETWEEN @EndDateBegin AND @EndDateEnd
) AS intersected (id)
OPTION (RECOMPILE)This form happens to produce a runtime estimate of 2110 rows (versus 2076 actual). Unless you have TF 2301 on, in which case the more advanced modelling techniques see through the trick and produce exactly the same estimate as before: 468 rows.
One day SQL Server might gain native support for intervals. If that comes with good statistical support, developers might dread tuning query plans like this a little less.
Code Snippets
CREATE TABLE dbo.SomeDateTable
(
Id INTEGER IDENTITY(1, 1) PRIMARY KEY NOT NULL,
StartDate DATETIME NOT NULL,
EndDate DATETIME NOT NULL
);
GO
SET STATISTICS XML OFF
SET NOCOUNT ON;
DECLARE
@i INTEGER = 1,
@s FLOAT = RAND(20120104),
@e FLOAT = RAND();
WHILE @i <= 10000
BEGIN
INSERT dbo.SomeDateTable
(
StartDate,
EndDate
)
VALUES
(
DATEADD(DAY, @s * 365, {d '2009-01-01'}),
DATEADD(DAY, @s * 365 + @e * 14, {d '2009-01-01'})
)
SELECT
@s = RAND(),
@e = RAND(),
@i += 1
ENDCREATE INDEX nc1 ON dbo.SomeDateTable (StartDate, EndDate)
CREATE INDEX nc2 ON dbo.SomeDateTable (EndDate, StartDate)DECLARE
@StartDateBegin DATETIME = {d '2009-08-01'},
@StartDateEnd DATETIME = {d '2009-10-15'},
@EndDateBegin DATETIME = {d '2009-08-05'},
@EndDateEnd DATETIME = {d '2009-10-22'}
SELECT
COUNT_BIG(*)
FROM dbo.SomeDateTable AS sdt
WHERE
sdt.StartDate BETWEEN @StartDateBegin AND @StartDateEnd
AND sdt.EndDate BETWEEN @EndDateBegin AND @EndDateEndSELECT COUNT(*) FROM
(
SELECT
sdt.Id
FROM dbo.SomeDateTable AS sdt
WHERE
sdt.StartDate BETWEEN @StartDateBegin AND @StartDateEnd
INTERSECT
SELECT
sdt.Id
FROM dbo.SomeDateTable AS sdt
WHERE
sdt.EndDate BETWEEN @EndDateBegin AND @EndDateEnd
) AS intersected (id)
OPTION (RECOMPILE)Context
StackExchange Database Administrators Q#10113, answer score: 32
Revisions (0)
No revisions yet.