patternsqlModerate
Creating an Index that is not used by a (SELECT) query reduces performance of that query
Viewed 0 times
reducescreatingusedqueryselectthatperformanceindexnot
Problem
I have just watched this video by Pinal Dave.
He has a
He then creates an index (Let's call it,
However, he then creates a further index (we'll call it
He then runs his query again and despite
He doesn't actually seem to desribe what causes this (unless I missed it)
The code to reproduce is below (thanks to Martin Smith for providing the Pastebin) This assumes a vanilla version of AdventureWorks with the standard indexes on
here is the plan before we do anything (nested loops and table spools)
Then we create an index
```
CREATE NONCLUSTERED INDEX IX_Index1 ON Sales.SalesOrderDetail (SalesOrderID, P
He has a
SELECT query that produces ~370k reads in tempdb and ~1200 reads of the table the query is SELECTing from.He then creates an index (Let's call it,
Index1) which removes the tempdb spool and therefore improves the performance of the query. All OK so far.However, he then creates a further index (we'll call it
Index2) and leaves the Index1 intact.He then runs his query again and despite
Index2 not being used, the query performance returns to how it was originally, with the ~370k tempdb spool still in place.He doesn't actually seem to desribe what causes this (unless I missed it)
The code to reproduce is below (thanks to Martin Smith for providing the Pastebin) This assumes a vanilla version of AdventureWorks with the standard indexes on
SalesOrderDetailSET STATISTICS XML ON;
SET STATISTICS IO ON
GO
-- The query
DBCC FREEPROCCACHE;
SELECT SalesOrderID, ProductId,SalesOrderDetailID, OrderQty
FROM Sales.SalesOrderDetail sod
WHERE ProductID = (SELECT AVG(ProductID)
FROM Sales.SalesOrderDetail sod1
WHERE sod.SalesOrderID = sod1.SalesOrderID
GROUP BY sod1.SalesOrderID);
/*
(11110 rows affected)
Table 'Worktable'. Scan count 3, logical reads 368495, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'SalesOrderDetail'. Scan count 1, logical reads 1246, physical reads 2, page server reads 0, read-ahead reads 1284, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
*/here is the plan before we do anything (nested loops and table spools)
Then we create an index
```
CREATE NONCLUSTERED INDEX IX_Index1 ON Sales.SalesOrderDetail (SalesOrderID, P
Solution
Current State
In your "good" case the plan looks as below. The upper branch is sorted by
Your "bad" case uses a typical execution plan for a row-mode windowed aggregate (with a common subexpression spool). It isn't as bad as the linked video makes out. Reads are reported per row read not per page read for these work tables (so multiplying reads by 8 KB to calculate the data in the spool is certainly invalid) but nonetheless the "bad" case is costed more according to SQL Server's cost model so why does it choose it?
Merge join refresher
A merge join on columns
How does it choose between
This seems to be a general limitation of merge join and composite join predicates. In the absence of any external reason to choose a specified ordering (e.g. an explicit
For table sources apparently it will look for the first index it comes across that is suitable for providing the required columns in either order and adopt the key column order from that as the order required for the merge join.
The order used for index matching appears to be in reverse order of
It will not do any analysis as to whether subsequent indexes might be better suited (e.g. as narrower or more compatible with index hints already used in the query).
I'm not 100% certain that this is the code responsible but the call stack below indicates that the implementation rule for sort merge join is looking in table metadata to find out some "Natural Sort"
Another example of adding an index making the plan worse - concentrating on merge join
In your "good" case the plan looks as below. The upper branch is sorted by
SalesOrderID (and therefore also by SalesOrderID,ProductId as the grouping ensures there is only one row per SalesOrderID). The lower branch reads the index in SalesOrderID,ProductId order and they are merge joined together.Your "bad" case uses a typical execution plan for a row-mode windowed aggregate (with a common subexpression spool). It isn't as bad as the linked video makes out. Reads are reported per row read not per page read for these work tables (so multiplying reads by 8 KB to calculate the data in the spool is certainly invalid) but nonetheless the "bad" case is costed more according to SQL Server's cost model so why does it choose it?
Merge join refresher
A merge join on columns
X,Y requires both inputs to the merge join be ordered in a compatible manner. At a minimum both inputs must be ordered at least by the same initial column (i.e. both ordered by X or both ordered by Y). For greatest efficiency (i.e. ideally to avoid overhead of "many to many" merge join) they should generally both be ordered in the same manner for all keys involved in the equi join predicate - i.e. either ORDER BY X,Y or ORDER BY Y,X (the ASC, DESC direction of each key is not important but must be the same in both inputs)How does it choose between
ORDER BY X,Y and ORDER BY Y,X?This seems to be a general limitation of merge join and composite join predicates. In the absence of any external reason to choose a specified ordering (e.g. an explicit
ORDER BY) it will just decide a somewhat arbitrary column order. For table sources apparently it will look for the first index it comes across that is suitable for providing the required columns in either order and adopt the key column order from that as the order required for the merge join.
The order used for index matching appears to be in reverse order of
indexid. This will generally correlate with index creation order but not always (as clustered indexes are reserved an indexid of 1 or a gap in ids may be filled in following a DROP INDEX)It will not do any analysis as to whether subsequent indexes might be better suited (e.g. as narrower or more compatible with index hints already used in the query).
I'm not 100% certain that this is the code responsible but the call stack below indicates that the implementation rule for sort merge join is looking in table metadata to find out some "Natural Sort"
Another example of adding an index making the plan worse - concentrating on merge join
CREATE TABLE #Test(X INT, Y INT, Filler CHAR(8000),INDEX IX1 UNIQUE /*Index id = 2*/ (X,Y));
INSERT INTO #Test (X,Y) SELECT TOP 10000 ROW_NUMBER() OVER (ORDER BY 1/0), ROW_NUMBER() OVER (ORDER BY 1/0) FROM sys.all_objects o1, sys.all_objects o2;
SET STATISTICS IO ON;
--#1 logical reads 60 uses IX1 - merge join chosen organically
SELECT t1.X, t1.Y
FROM #Test t1
INNER JOIN #Test t2 ON t1.X = t2.X and t1.Y = t2.Y
SET STATISTICS IO OFF;
CREATE UNIQUE CLUSTERED INDEX ix2 ON #Test(Y,X)
SET STATISTICS IO ON;
--#2 logical reads 50, Still using IX1 and merge join. The clustered index just created has an id of 1 so lower than 2.
--IX1 no longer contains a RID so reads a bit lower than previously
SELECT t1.X, t1.Y
FROM #Test t1
INNER JOIN #Test t2 ON t1.X = t2.X and t1.Y = t2.Y
SET STATISTICS IO OFF;
CREATE UNIQUE INDEX ix3 ON #Test(Y,X) INCLUDE (Filler);
SET STATISTICS IO ON;
--#3 logical reads 20,068 - No longer chooses MERGE join of its own accord and if forced uses more expensive index
SELECT t1.X, t1.Y
FROM #Test t1
INNER MERGE JOIN #Test t2 ON t1.X = t2.X and t1.Y = t2.Y;
--#4 Back to 50 reads. The merge join is happy to use the order required by the ORDER BY
SELECT t1.X, t1.Y
FROM #Test t1
INNER MERGE JOIN #Test t2 ON t1.X = t2.X and t1.Y = t2.Y
ORDER BY t1.X, t1.Y;
--#5 50 reads but now uses hash join
SELECT t1.X, t1.Y
FROM #Test t1
INNER JOIN #Test t2 ON t1.X = t2.X and t1.Y+0 = t2.Y+0;
--#6 50 reads, Forcing the merge join and looking at properties shows it is now seen as "many to many"
SELECT t1.X, t1.Y
FROM #Test t1
INNER MERGE JOIN #Test t2 ON t1.X = t2.X and t1.Y+0 = t2.Y+0;
SET STATISTICS IO OFF;
--What if there is no useful index?
DROP INDEX ix3 ON #Test
DROP INDEX ix1 ON #Test
DROP INDEX ix2 ON #Test
CREATE CLUSTERED INDEX ix4 ON #Test(X)
--#7 Sorts are by X, Y
SELECT t1.X, t1.Y
FROM #Test t1
INNER MERGE JOIN #Test t2 ON t1.X = t2.X and t1.Y = t2.Y;
--#8 Sorts are by Y, X
SELECT t1.X, t1.Y
FROM #Test t1
INNER MERGE JOIN #Test t2 ON t1.Y = t2.Y AND t1.X = t2.X;
DROP TABLE #Test- #1 There is a covering non clustered index on the table with columns
(X,Y)- Merge join is chosen naturally (without hints) and uses index ordered scan.
- #2 Add a clustered index with keys ordered
(Y,X). Merge join still chosen naturally and using originally created index. The clustered index will have an id of 1 - which is lower tha
Code Snippets
CREATE TABLE #Test(X INT, Y INT, Filler CHAR(8000),INDEX IX1 UNIQUE /*Index id = 2*/ (X,Y));
INSERT INTO #Test (X,Y) SELECT TOP 10000 ROW_NUMBER() OVER (ORDER BY 1/0), ROW_NUMBER() OVER (ORDER BY 1/0) FROM sys.all_objects o1, sys.all_objects o2;
SET STATISTICS IO ON;
--#1 logical reads 60 uses IX1 - merge join chosen organically
SELECT t1.X, t1.Y
FROM #Test t1
INNER JOIN #Test t2 ON t1.X = t2.X and t1.Y = t2.Y
SET STATISTICS IO OFF;
CREATE UNIQUE CLUSTERED INDEX ix2 ON #Test(Y,X)
SET STATISTICS IO ON;
--#2 logical reads 50, Still using IX1 and merge join. The clustered index just created has an id of 1 so lower than 2.
--IX1 no longer contains a RID so reads a bit lower than previously
SELECT t1.X, t1.Y
FROM #Test t1
INNER JOIN #Test t2 ON t1.X = t2.X and t1.Y = t2.Y
SET STATISTICS IO OFF;
CREATE UNIQUE INDEX ix3 ON #Test(Y,X) INCLUDE (Filler);
SET STATISTICS IO ON;
--#3 logical reads 20,068 - No longer chooses MERGE join of its own accord and if forced uses more expensive index
SELECT t1.X, t1.Y
FROM #Test t1
INNER MERGE JOIN #Test t2 ON t1.X = t2.X and t1.Y = t2.Y;
--#4 Back to 50 reads. The merge join is happy to use the order required by the ORDER BY
SELECT t1.X, t1.Y
FROM #Test t1
INNER MERGE JOIN #Test t2 ON t1.X = t2.X and t1.Y = t2.Y
ORDER BY t1.X, t1.Y;
--#5 50 reads but now uses hash join
SELECT t1.X, t1.Y
FROM #Test t1
INNER JOIN #Test t2 ON t1.X = t2.X and t1.Y+0 = t2.Y+0;
--#6 50 reads, Forcing the merge join and looking at properties shows it is now seen as "many to many"
SELECT t1.X, t1.Y
FROM #Test t1
INNER MERGE JOIN #Test t2 ON t1.X = t2.X and t1.Y+0 = t2.Y+0;
SET STATISTICS IO OFF;
--What if there is no useful index?
DROP INDEX ix3 ON #Test
DROP INDEX ix1 ON #Test
DROP INDEX ix2 ON #Test
CREATE CLUSTERED INDEX ix4 ON #Test(X)
--#7 Sorts are by X, Y
SELECT t1.X, t1.Y
FROM #Test t1
INNER MERGE JOIN #Test t2 ON t1.X = t2.X and t1.Y = t2.Y;
--#8 Sorts are by Y, X
SELECT t1.X, t1.Y
FROM #Test t1
INNER MERGE JOIN #Test t2 ON t1.Y = t2.Y AND t1.X = t2.X;
DROP TABLE #TestContext
StackExchange Database Administrators Q#259780, answer score: 15
Revisions (0)
No revisions yet.