patternModerate

Poor performing subquery with date comparisons

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

performingwithdatepoorsubquerycomparisons

Problem

When using a subquery to find the total count of all prior records with a matching field, the performance is terrible on a table with as little as 50k records. Without the subquery, the query executes in a few milliseconds. With the subquery, the execution time is upwards of a minute.

For this query, the result must:

Include only those records within a given date range.

Include a count of all prior records, not including the current record, regardless of date range.

Basic Table Schema

Activity
======================
Id int Identifier
Address varchar(25)
ActionDate datetime2
Process varchar(50)
-- 7 other columns

Example Data

Id  Address     ActionDate (Time part excluded for simplicity)
===========================
99  000         2017-05-30
98  111         2017-05-30
97  000         2017-05-29
96  000         2017-05-28
95  111         2017-05-19
94  222         2017-05-30

Expected Results

For the date range of 2017-05-29 to 2017-05-30

Id  Address     ActionDate    PriorCount
=========================================
99  000         2017-05-30    2  (3 total, 2 prior to ActionDate)
98  111         2017-05-30    1  (2 total, 1 prior to ActionDate)
94  222         2017-05-30    0  (1 total, 0 prior to ActionDate)
97  000         2017-05-29    1  (3 total, 1 prior to ActionDate)

Records 96 and 95 are excluded from the result, but are included in the PriorCount subquery

Current Query

select 
    *.a
    , ( select count(*) 
        from Activity
        where 
            Activity.Address = a.Address
            and Activity.ActionDate < a.ActionDate
    ) as PriorCount
from Activity a
where a.ActionDate between '2017-05-29' and '2017-05-30'
order by a.ActionDate desc

Current Index

```
CREATE NONCLUSTERED INDEX [IDX_my_nme] ON [dbo].[Activity]
(
[ActionDate] ASC
)
INCLUDE ([Address]) WITH (
PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF,
ONLINE = OFF,
ALL

Solution

With the index definition that you have for IDX_my_nme, SQL Server will be able to seek using the ActionDate column but not with the Address column. The index contains all of the columns needed to cover the subquery but it likely isn't very selective for that subquery. Suppose that almost all of the data in the table has an ActionDate value of earlier than '2017-05-30'. A seek of ActionDate < '2017-05-30' will return almost all of the rows from the index, which are further filtered down after the row is fetched from the index. If your query returns 200 rows then you would probably doing almost 200 full index scans on IDX_my_nme, which means you will read around 50000 * 200 = 10 million rows from the index.

It's likely that seeking on Address will be far more selective for your subquery, although you haven't given us full statistical information about the query so that's an assumption on my part. However, suppose that you create an index on just Address and your table has 10k unique values for Address. With the new index, SQL Server will only need to seek 5 rows from the index for each execution of the subquery, so you'll read around 200 * 5 = 1000 rows from the index.

I'm testing against SQL Server 2016 so there might be some minor syntax differences. Below is some sample data in which I made similar assumptions to the above for data distribution:

CREATE TABLE #Activity (
    Id int NOT NULL,
    [Address] varchar(25) NULL,
    ActionDate datetime2 NULL,
    FILLER varchar(100),
    PRIMARY KEY (Id)
);

INSERT INTO #Activity WITH (TABLOCK)
SELECT TOP (50000) -- 50k total rows
x.RN
, x.RN % 10000 -- 10k unique addresses
, DATEADD(DAY, x.RN / 100, '20160201') -- 100 rows per day
, REPLICATE('Z', 100)
FROM
(
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
    FROM master..spt_values t1
    CROSS JOIN master..spt_values t2
) x;

CREATE NONCLUSTERED INDEX [IDX_my_nme] ON #Activity
([ActionDate] ASC) INCLUDE ([Address]);

I've created your index as described in the question. I'm testing against this query which returns the same data as the one in the question:

select 
    a.*
    , ( select count(*) 
        from #Activity Activity
        where 
            Activity.[Address] = a.[Address]
            and Activity.ActionDate < a.ActionDate
    ) as PriorCount
from #Activity a
where a.ActionDate between '2017-05-29' and '2017-05-30'
order by a.ActionDate desc;

I get an index spool. What that means at a basic level is that the query optimizer build a temporary index on-the-fly because none of the existing indexes against the table were suitable.

The query still finishes quickly for me. Perhaps you aren't getting the index spool optimization on your system or there's something different about the table definition or the query. For educational purposes I can use an undocumented feature OPTION (QUERYRULEOFF BuildSpool) to disable the index spool. Here's what the plan looks like:

Don't be fooled by the appearance of a simple index seek. SQL Server reads nearly 10 million rows from the index:

If I'm going to be running the query more than once then it probably doesn't make sense for the query optimizer to create an index each time it runs. I could create an index upfront that would be more selective for this query:

CREATE NONCLUSTERED INDEX [IDX_my_nme_2] ON #Activity
([Address] ASC) INCLUDE (ActionDate);

The plan is similar to before:

However, with the new index SQL Server only reads 1000 rows from the index. 800 of the rows are returned to be counted. The index could be defined to be more selective but this could be good enough depending on your data distribution.

If you aren't able to define any additional indexes on the table I would consider using window functions. The following appears to work:

SELECT t.*
FROM
(
    select 
        a.*
        , -1 + ROW_NUMBER() OVER (PARTITION BY [Address] ORDER BY ActionDate) PriorCount
    from #Activity a
) t
where t.ActionDate between '2017-05-29' and '2017-05-30'
order by t.ActionDate desc;

That query does a single scan of the data but does an expensive sort and calculates the ROW_NUMBER() function for every row in the table, so it feels like there's some extra work done here:

However, if you really like that code pattern you could define an index to make it more efficient:

CREATE NONCLUSTERED INDEX [IDX_my_nme] ON #Activity
([Address], [ActionDate]) INCLUDE (FILLER);

That moves the sort towards the end which will be much less expensive:

If none of this helps then you'll need to add more information to the question, preferably including actual execution plans.

Code Snippets

CREATE TABLE #Activity (
    Id int NOT NULL,
    [Address] varchar(25) NULL,
    ActionDate datetime2 NULL,
    FILLER varchar(100),
    PRIMARY KEY (Id)
);

INSERT INTO #Activity WITH (TABLOCK)
SELECT TOP (50000) -- 50k total rows
x.RN
, x.RN % 10000 -- 10k unique addresses
, DATEADD(DAY, x.RN / 100, '20160201') -- 100 rows per day
, REPLICATE('Z', 100)
FROM
(
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
    FROM master..spt_values t1
    CROSS JOIN master..spt_values t2
) x;

CREATE NONCLUSTERED INDEX [IDX_my_nme] ON #Activity
([ActionDate] ASC) INCLUDE ([Address]);

select 
    a.*
    , ( select count(*) 
        from #Activity Activity
        where 
            Activity.[Address] = a.[Address]
            and Activity.ActionDate < a.ActionDate
    ) as PriorCount
from #Activity a
where a.ActionDate between '2017-05-29' and '2017-05-30'
order by a.ActionDate desc;

CREATE NONCLUSTERED INDEX [IDX_my_nme_2] ON #Activity
([Address] ASC) INCLUDE (ActionDate);

SELECT t.*
FROM
(
    select 
        a.*
        , -1 + ROW_NUMBER() OVER (PARTITION BY [Address] ORDER BY ActionDate) PriorCount
    from #Activity a
) t
where t.ActionDate between '2017-05-29' and '2017-05-30'
order by t.ActionDate desc;

CREATE NONCLUSTERED INDEX [IDX_my_nme] ON #Activity
([Address], [ActionDate]) INCLUDE (FILLER);

Context

StackExchange Database Administrators Q#173550, answer score: 17

Revisions (0)

No revisions yet.