patternsqlMinor
Why is DistinctSort followed by TopNSort in my query plan?
Viewed 0 times
whyquerydistinctsortfollowedplantopnsort
Problem
I have the following definition:
and the following query:
I run this query and inspect the actual execution plan and here's what's going on:
What's this magical
CREATE TABLE [dbo].[JobItems] (
[ItemId] UNIQUEIDENTIFIER NOT NULL,
[ItemState] INT NOT NULL,
[ItemCreationTime] DATETIME NULL DEFAULT GETUTCDATE(),
[ItemPriority] TINYINT NOT NULL DEFAULT(0),
[ItemRefreshTime] DATETIME NULL,
-- lots of other columns
CONSTRAINT [PrimaryKey_GUID_HERE] PRIMARY KEY NONCLUSTERED ([ItemId] ASC)
);
CREATE UNIQUE CLUSTERED INDEX [JobItemsIndex]
ON [dbo].[JobItems]([ItemId] ASC);
CREATE INDEX [GetTaskToProcessIndex]
ON [dbo].[JobItems]([ItemState], [ItemPriority], [ItemCreationTime])and the following query:
SELECT TOP(1) ItemId FROM JobItems
WHERE ItemState = 5 OR
( ( ItemState = 11 ) AND ( DATEDIFF( SECOND, ItemRefreshTime, GETUTCDATE() ) > 14 ) )
ORDER BY ItemPriority ASC, ItemCreationTime ASCI run this query and inspect the actual execution plan and here's what's going on:
- Index seek is done to retrieve items with
ItemState=5.
- Index seek is done to retrieve items with
ItemState=11and then for each row a separate seek is done to retrieveItemRefreshTimeand the results of two seeks are filtered using nested loops.
- Sets from 1 and 2 containing
ItemId,ItemCreationTimeandItemPriorityare concatenated and then...
- Magical
DistinctSorthappens withORDER BY ItemId ASCand finally
TopNSorthappens withORDER BY ItemPriority ASC, ItemCreationTime ASC
TopNSort and DistinctSort take something like 32 percent each so I'd be happy to get rid of DistinctSort - I don't even understand its purpose.What's this magical
TopNSort which useful DistinctSort and why is it there?Solution
I can reproduce the plan that you describe on SQL Server 2012 (on prem) by running the DDL in your question and then fiddling the stats so SQL Server thinks that the table is much larger than reality.
And then running the query with
This is an index union plan. The concatenation operator implements
An example of where this might be needed is in the query below. (note the two parameters are set to the same value so an index union plan there would seek the same rows twice)
The Top N Sort then re-sorts the data to implement the
In your case the Distinct Sort is logically not required for multiple reasons. The
An alternative way of writing the query (that gives a better plan using the indexes to avoid any sorts at all) is
You could consider adding
UPDATE STATISTICS [dbo].[JobItems] WITH ROWCOUNT = 10000000, pagecount = 10000000And then running the query with
OPTION (MAXDOP 1, CONCAT UNION, ORDER GROUP).This is an index union plan. The concatenation operator implements
UNION ALL. The Distinct Sort changes the semantics to a UNION operation to prevent the same row being returned multiple times. (In the event the table had no index key to act as a unique row identifier the physical rid would have been used here to avoid incorrectly de-duping different rows that happen to have the same column values)An example of where this might be needed is in the query below. (note the two parameters are set to the same value so an index union plan there would seek the same rows twice)
DECLARE @ItemState1 INT = 5
, @ItemState2 INT = 5
SELECT ItemId
FROM JobItems
WHERE ItemState = @ItemState1
OR ( ( ItemState = @ItemState2 )
AND ( DATEDIFF(SECOND, ItemRefreshTime, GETUTCDATE()) > 14 ) )The Top N Sort then re-sorts the data to implement the
TOP 1.In your case the Distinct Sort is logically not required for multiple reasons. The
ItemState = 5 and ItemState = 11 branches are mutually exclusive (and this could be determined at compile time) and additionally the TOP 1 ... ORDER BY ItemPriority ASC, ItemCreationTime ASC semantics wouldn't be affected even if there were erroneous duplicates.An alternative way of writing the query (that gives a better plan using the indexes to avoid any sorts at all) is
SELECT TOP(1) ItemId
FROM (SELECT ItemId,
ItemPriority,
ItemCreationTime
FROM JobItems
WHERE ItemState = 5
UNION ALL
SELECT ItemId,
ItemPriority,
ItemCreationTime
FROM JobItems
WHERE ( ( ItemState = 11 )
AND ( DATEDIFF(SECOND, ItemRefreshTime, GETUTCDATE()) > 14 ) )) T
ORDER BY ItemPriority ASC,
ItemCreationTime ASCYou could consider adding
ItemRefreshTime as an included column to the index to avoid the key lookup if in practice quite a few may be required before finding a single row satisfying the residual predicate.Code Snippets
UPDATE STATISTICS [dbo].[JobItems] WITH ROWCOUNT = 10000000, pagecount = 10000000DECLARE @ItemState1 INT = 5
, @ItemState2 INT = 5
SELECT ItemId
FROM JobItems
WHERE ItemState = @ItemState1
OR ( ( ItemState = @ItemState2 )
AND ( DATEDIFF(SECOND, ItemRefreshTime, GETUTCDATE()) > 14 ) )SELECT TOP(1) ItemId
FROM (SELECT ItemId,
ItemPriority,
ItemCreationTime
FROM JobItems
WHERE ItemState = 5
UNION ALL
SELECT ItemId,
ItemPriority,
ItemCreationTime
FROM JobItems
WHERE ( ( ItemState = 11 )
AND ( DATEDIFF(SECOND, ItemRefreshTime, GETUTCDATE()) > 14 ) )) T
ORDER BY ItemPriority ASC,
ItemCreationTime ASCContext
StackExchange Database Administrators Q#108868, answer score: 7
Revisions (0)
No revisions yet.