patternsqlMajor

Most Efficient Way to Retrieve Date Ranges

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

efficientwaydaterangesretrievemost

Problem

What's the most efficient way to retrieve date ranges with a table structure like this?

create table SomeDateTable
(
    id int identity(1, 1) not null,
    StartDate datetime not null,
    EndDate datetime not null
)
go

Say you want a range for both StartDate and EndDate. So in other words, if StartDate falls in between @StartDateBegin and @StartDateEnd, and EndDate falls in between @EndDateBegin and @EndDateEnd, then do something.

I know there are a few ways to probably go about this, but what is the most advised?

Solution

This is a hard problem to solve in general, but there are a couple of things we can do to help the optimizer choose a plan. This script creates a table with 10,000 rows with a known pseudo-random distribution of rows to illustrate:

CREATE TABLE dbo.SomeDateTable
(
    Id          INTEGER IDENTITY(1, 1) PRIMARY KEY NOT NULL,
    StartDate   DATETIME NOT NULL,
    EndDate     DATETIME NOT NULL
);
GO
SET STATISTICS XML OFF
SET NOCOUNT ON;
DECLARE
    @i  INTEGER = 1,
    @s  FLOAT = RAND(20120104),
    @e  FLOAT = RAND();

WHILE @i <= 10000
BEGIN
    INSERT dbo.SomeDateTable
        (
        StartDate, 
        EndDate
        )
    VALUES
        (
        DATEADD(DAY, @s * 365, {d '2009-01-01'}),
        DATEADD(DAY, @s * 365 + @e * 14, {d '2009-01-01'})
        )
        
    SELECT
        @s = RAND(),
        @e = RAND(),
        @i += 1
END

The first question is how to index this table. One option is to provide two indexes on the DATETIME columns, so the optimizer can at least choose whether to seek on StartDate or EndDate.

CREATE INDEX nc1 ON dbo.SomeDateTable (StartDate, EndDate)
CREATE INDEX nc2 ON dbo.SomeDateTable (EndDate, StartDate)

Naturally, the inequalities on both StartDate and EndDate mean that only one column in each index can support a seek in the example query, but this is about the best we can do. We might consider making the second column in each index an INCLUDE rather than a key, but we might have other queries that can perform an equality seek on the leading column and an inequality seek on the second column. Also, we may get better statistics this way. Anyway...

DECLARE
    @StartDateBegin DATETIME = {d '2009-08-01'},
    @StartDateEnd DATETIME = {d '2009-10-15'},
    @EndDateBegin DATETIME = {d '2009-08-05'},
    @EndDateEnd DATETIME = {d '2009-10-22'}

SELECT
    COUNT_BIG(*)
FROM dbo.SomeDateTable AS sdt
WHERE
    sdt.StartDate BETWEEN @StartDateBegin AND @StartDateEnd
    AND sdt.EndDate BETWEEN @EndDateBegin AND @EndDateEnd

This query uses variables, so in general the optimizer will guess at selectivity and distribution, resulting in a guessed cardinality estimate of 81 rows. In fact, the query produces 2076 rows, a discrepancy that might be important in a more complex example.

On SQL Server 2008 SP1 CU5 or later (or R2 RTM CU1) we can take advantage of the Parameter Embedding Optimization to get better estimates, simply by adding OPTION (RECOMPILE) to the SELECT query above. This causes a compilation just before the batch executes, allowing SQL Server to 'see' the real parameter values and optimize for those. With this change, the estimate improves to 468 rows (though you do need to check the runtime plan to see this). This estimate is better than 81 rows, but still not all that close. The modelling extensions enabled by trace flag 2301 may help in some cases, but not with this query.

The problem is where the rows qualified by the two range searches overlap. One of the simplifying assumptions made in the optimizer's costing and cardinality estimation component is that predicates are independent (so if both have a selectivity of 50%, the result of applying both is assumed to qualify 50% of 50% = 25% of the rows). Where this sort of correlation is a problem, we can often work around it with multi-column and/or filtered statistics. With two ranges with unknown start and end points, this becomes impractical. This is where we sometimes have to resort to rewriting the query to a form that happens to produce a better estimate:

SELECT COUNT(*) FROM
(
    SELECT
        sdt.Id
    FROM dbo.SomeDateTable AS sdt
    WHERE 
        sdt.StartDate BETWEEN @StartDateBegin AND @StartDateEnd
    INTERSECT
    SELECT
        sdt.Id
    FROM dbo.SomeDateTable AS sdt 
    WHERE
        sdt.EndDate BETWEEN @EndDateBegin AND @EndDateEnd
) AS intersected (id)
OPTION (RECOMPILE)

This form happens to produce a runtime estimate of 2110 rows (versus 2076 actual). Unless you have TF 2301 on, in which case the more advanced modelling techniques see through the trick and produce exactly the same estimate as before: 468 rows.

One day SQL Server might gain native support for intervals. If that comes with good statistical support, developers might dread tuning query plans like this a little less.

Code Snippets

CREATE TABLE dbo.SomeDateTable
(
    Id          INTEGER IDENTITY(1, 1) PRIMARY KEY NOT NULL,
    StartDate   DATETIME NOT NULL,
    EndDate     DATETIME NOT NULL
);
GO
SET STATISTICS XML OFF
SET NOCOUNT ON;
DECLARE
    @i  INTEGER = 1,
    @s  FLOAT = RAND(20120104),
    @e  FLOAT = RAND();

WHILE @i <= 10000
BEGIN
    INSERT dbo.SomeDateTable
        (
        StartDate, 
        EndDate
        )
    VALUES
        (
        DATEADD(DAY, @s * 365, {d '2009-01-01'}),
        DATEADD(DAY, @s * 365 + @e * 14, {d '2009-01-01'})
        )
        
    SELECT
        @s = RAND(),
        @e = RAND(),
        @i += 1
END

CREATE INDEX nc1 ON dbo.SomeDateTable (StartDate, EndDate)
CREATE INDEX nc2 ON dbo.SomeDateTable (EndDate, StartDate)

DECLARE
    @StartDateBegin DATETIME = {d '2009-08-01'},
    @StartDateEnd DATETIME = {d '2009-10-15'},
    @EndDateBegin DATETIME = {d '2009-08-05'},
    @EndDateEnd DATETIME = {d '2009-10-22'}

SELECT
    COUNT_BIG(*)
FROM dbo.SomeDateTable AS sdt
WHERE
    sdt.StartDate BETWEEN @StartDateBegin AND @StartDateEnd
    AND sdt.EndDate BETWEEN @EndDateBegin AND @EndDateEnd

SELECT COUNT(*) FROM
(
    SELECT
        sdt.Id
    FROM dbo.SomeDateTable AS sdt
    WHERE 
        sdt.StartDate BETWEEN @StartDateBegin AND @StartDateEnd
    INTERSECT
    SELECT
        sdt.Id
    FROM dbo.SomeDateTable AS sdt 
    WHERE
        sdt.EndDate BETWEEN @EndDateBegin AND @EndDateEnd
) AS intersected (id)
OPTION (RECOMPILE)

Context

StackExchange Database Administrators Q#10113, answer score: 32

Revisions (0)

No revisions yet.