patternsqlModerate

Query doesn't respond when adding two columns

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

columnsqueryaddingtwodoesnwhenrespond

Problem

When I add two columns to my select the query doesn't respond. The type of column is nvarchar(2000). It's a little unusual.

The SQL Server version is 2014.

There is only one primary index.

The whole records is only 1000 rows.

Here is execution plan before (XML showplan):

Execution plan after (XML showplan):

Here is the query:

select top(100)
  Batch_Tasks_Queue.id,
  btq.id,
  Batch_Tasks_Queue.[Parameters], -- this field
  btq.[Parameters]  -- and this field
from
        Batch_Tasks_Queue with(nolock)
    inner join  Batch_Tasks_Queue btq with(nolock)  on  Batch_Tasks_Queue.Start_Time  btq.id                            
                            and btq.Start_Time is not null
                            and btq.State in (3, 4)                          
where
    Batch_Tasks_Queue.Start_Time is not null      
    and Batch_Tasks_Queue.State in (3, 4)
    and Batch_Tasks_Queue.Operation_Type = btq.Operation_Type
    and Batch_Tasks_Queue.Operation_Type not in (23, 24, 25, 26, 27, 28, 30)

order by
    Batch_Tasks_Queue.Start_Time desc

The whole result count is 17 rows. The dirty data (nolock hint) is not important.

Here is the table structure:

```
CREATE TABLE [dbo].Batch_Tasks_Queue NOT NULL,
[Operation_Type] numeric NULL,
[Request_Time] [datetime] NOT NULL,
[Description] varchar NULL,
[State] numeric NOT NULL,
[Start_Time] [datetime] NULL,
[Finish_Time] [datetime] NULL,
[Parameters] nvarchar NULL,
[Response] nvarchar NULL,
[Billing_UserId] [int] NOT NULL,
[Planned_Start_Time] [datetime] NULL,
[Input_FileId] [uniqueidentifier] NULL,
[Output_FileId] [uniqueidentifier] NULL,
[PRIORITY] numeric NULL,
[EXECUTE_SEQ] numeric NULL,
[View_Access] numeric NULL,
[Seeing] numeric NULL,
CONSTRAINT [PKBachTskQ] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (P

Solution

Summary

The main problems are:

The optimizer's plan selection assumes a uniform distribution of values.

A lack of suitable indexes means:

Scanning the table is the only option.

The join is a naive nested loops join, rather than an index nested loops join. In a naive join, the join predicates are evaluated at the join rather than being pushed down the inner side of the join.

Details

The two plans are fundamentally pretty similar, though performance may be very different:

Plan with the extra columns

Taking the one with the extra columns that doesn't complete in a reasonable time first:

The interesting features are:

The Top at node 0 limits the rows returned to 100. It also sets a row goal for the optimizer, so everything below it in the plan is chosen to return the first 100 rows quickly.

The Scan at node 4 finds rows from the table where the Start_Time is not null, State is 3 or 4, and Operation_Type is one of the listed values. The table is fully scanned once, with each row being tested against the predicates mentioned. Only rows that pass all the tests flow on to the Sort. The optimizer estimates that 38,283 rows will qualify.

The Sort at node 3 consumes all the rows from the Scan at node 4, and sorts them in order of Start_Time DESC. This is the final presentation order requested by the query.

The optimizer estimates that 93 rows (actually 93.2791) will have to be read from the Sort in order for the whole plan to return 100 rows (accounting for the expected effect of the join).

The Nested Loops join at node 2 is expected to execute its inner input (the lower branch) 94 times (actually 94.2791). The extra row is required by the stop parallelism exchange at node 1 for technical reasons.

The Scan at node 5 fully scans the table on each iteration. It finds rows where Start_Time is not null and State is 3 or 4. This is estimated to produce 400,875 rows on each iteration. Over 94.2791 iterations, the total number of rows is almost 38 million.

The Nested Loops join at node 2 also applies the join predicates. It checks that Operation_Type matches, that the Start_Time from node 4 is less than the Start_Time from node 5, that the Start_Time from node 5 is less than the Finish_Time from node 4, and that the two Id values do not match.

The Gather Streams (stop parallelism exchange) at node 1 merges the ordered streams from each thread until 100 rows have been produced. The order-preserving nature of the merge across multiple streams is what requires the extra row mentioned in step 5.

The great inefficiency is obviously at steps 6 and 7 above. Fully scanning the table at node 5 for each iteration is only even slightly reasonable if it only happens 94 times as the optimizer predicts. The ~38 million per-row set of comparisons at node 2 is also a large cost.

Crucially, the 93/94 row row goal estimation is also quite likely to be wrong, since it depends on the distribution of values. The optimizer assumes uniform distribution in the absence of more detailed information. In simple terms, this means that if 1% of the rows in the table are expected to qualify, the optimizer reasons that to find 1 matching row, it needs to read 100 rows.

If you ran this query to completion (which might take a very long time), you would most likely find that many more than 93/94 rows had to be read from the Sort in order to finally produce 100 rows. In the worst case, the 100th row would be found using the last row from the Sort. Assuming the optimizer's estimate at node 4 is correct, this means running the Scan at node 5 38,284 times, for a total of something like 15 billion rows. It could be more if the Scan estimates are also off.

This execution plan also includes a missing index warning:

/*
The Query Processor estimates that implementing the following index
could improve the query cost by 72.7096%.

WARNING: This is only an estimate, and the Query Processor is making
this recommendation based solely upon analysis of this specific query.
It has not considered the resulting index size, or its workload-wide
impact, including its impact on INSERT, UPDATE, DELETE performance.
These factors should be taken into account before creating this index.
*/

CREATE NONCLUSTERED INDEX []
ON [dbo].[Batch_Tasks_Queue] ([Operation_Type],[State],[Start_Time])
INCLUDE ([Id],[Parameters])

The optimizer is alerting you to the fact that adding an index to the table would improve performance.

Plan without the extra columns

This is essentially the exact same plan as the previous one, with the addition of the Index Spool at node 6 and the Filter at node 5. The important differences are:

The Index Spool at node 6 is an Eager Spool. It eagerly consumes the result of the scan below it, and builds a temporary index keyed on Operation_Type and Start_Time, with Id as a non-key column.

The Nested Loops Join at node 2 is now an index join. No join predicates are evaluated here, i

Code Snippets

/*
The Query Processor estimates that implementing the following index
could improve the query cost by 72.7096%.

WARNING: This is only an estimate, and the Query Processor is making
this recommendation based solely upon analysis of this specific query.
It has not considered the resulting index size, or its workload-wide
impact, including its impact on INSERT, UPDATE, DELETE performance.
These factors should be taken into account before creating this index.
*/

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[Batch_Tasks_Queue] ([Operation_Type],[State],[Start_Time])
INCLUDE ([Id],[Parameters])

/*
The Query Processor estimates that implementing the following index
could improve the query cost by 24.1475%.

WARNING: This is only an estimate, and the Query Processor is making
this recommendation based solely upon analysis of this specific query.
It has not considered the resulting index size, or its workload-wide
impact, including its impact on INSERT, UPDATE, DELETE performance.
These factors should be taken into account before creating this index.
*/

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[Batch_Tasks_Queue] ([State],[Start_Time])
INCLUDE ([Id],[Operation_Type])
GO

CREATE NONCLUSTERED INDEX i1
ON dbo.Batch_Tasks_Queue (Start_Time DESC)
INCLUDE (Operation_Type, [State], Finish_Time);

SELECT TOP (100)
    BTQ1.id,
    BTQ2.id,
    BTQ3.[Parameters],
    BTQ4.[Parameters]
FROM dbo.Batch_Tasks_Queue AS BTQ1
JOIN dbo.Batch_Tasks_Queue AS BTQ2 WITH (FORCESEEK)
    ON BTQ2.Operation_Type = BTQ1.Operation_Type
    AND BTQ2.Start_Time > BTQ1.Start_Time
    AND BTQ2.Start_Time < BTQ1.Finish_Time
    AND BTQ2.id != BTQ1.id
    -- Look up the [Parameters] values
JOIN dbo.Batch_Tasks_Queue AS BTQ3
    ON BTQ3.Id = BTQ1.Id
JOIN dbo.Batch_Tasks_Queue AS BTQ4
    ON BTQ4.Id = BTQ2.Id
WHERE
    BTQ1.[State] IN (3, 4)
    AND BTQ2.[State] IN (3, 4)
    AND BTQ1.Operation_Type NOT IN (23, 24, 25, 26, 27, 28, 30)
    AND BTQ2.Operation_Type NOT IN (23, 24, 25, 26, 27, 28, 30)
    -- These predicates are not strictly needed
    AND BTQ1.Start_Time IS NOT NULL
    AND BTQ2.Start_Time IS NOT NULL
ORDER BY
    BTQ1.Start_Time DESC;

SELECT TOP (100)
    BTQ1.id,
    BTQ2.id
FROM dbo.Batch_Tasks_Queue AS BTQ1
JOIN dbo.Batch_Tasks_Queue AS BTQ2 WITH (FORCESEEK)
    ON BTQ2.Operation_Type = BTQ1.Operation_Type
    AND BTQ2.Start_Time > BTQ1.Start_Time
    AND BTQ2.Start_Time < BTQ1.Finish_Time
    AND BTQ2.id != BTQ1.id
WHERE
    BTQ1.[State] IN (3, 4)
    AND BTQ2.[State] IN (3, 4)
    AND BTQ1.Operation_Type NOT IN (23, 24, 25, 26, 27, 28, 30)
    AND BTQ2.Operation_Type NOT IN (23, 24, 25, 26, 27, 28, 30)
    AND BTQ1.Start_Time IS NOT NULL
    AND BTQ2.Start_Time IS NOT NULL
ORDER BY
    BTQ1.Start_Time DESC;

Context

StackExchange Database Administrators Q#157308, answer score: 15

Revisions (0)

No revisions yet.