HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Creating an Index that is not used by a (SELECT) query reduces performance of that query

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
reducescreatingusedqueryselectthatperformanceindexnot

Problem

I have just watched this video by Pinal Dave.

He has a SELECT query that produces ~370k reads in tempdb and ~1200 reads of the table the query is SELECTing from.

He then creates an index (Let's call it, Index1) which removes the tempdb spool and therefore improves the performance of the query. All OK so far.

However, he then creates a further index (we'll call it Index2) and leaves the Index1 intact.

He then runs his query again and despite Index2 not being used, the query performance returns to how it was originally, with the ~370k tempdb spool still in place.

He doesn't actually seem to desribe what causes this (unless I missed it)

The code to reproduce is below (thanks to Martin Smith for providing the Pastebin) This assumes a vanilla version of AdventureWorks with the standard indexes on SalesOrderDetail

SET STATISTICS XML ON;
SET STATISTICS IO ON
GO

-- The query
DBCC FREEPROCCACHE;
SELECT SalesOrderID, ProductId,SalesOrderDetailID, OrderQty
FROM   Sales.SalesOrderDetail sod
WHERE  ProductID = (SELECT AVG(ProductID)
                    FROM   Sales.SalesOrderDetail sod1
                    WHERE  sod.SalesOrderID = sod1.SalesOrderID
                    GROUP  BY sod1.SalesOrderID);
/* 
(11110 rows affected)
Table 'Worktable'. Scan count 3, logical reads 368495, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'SalesOrderDetail'. Scan count 1, logical reads 1246, physical reads 2, page server reads 0, read-ahead reads 1284, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
*/


here is the plan before we do anything (nested loops and table spools)

Then we create an index

```
CREATE NONCLUSTERED INDEX IX_Index1 ON Sales.SalesOrderDetail (SalesOrderID, P

Solution

Current State

In your "good" case the plan looks as below. The upper branch is sorted by SalesOrderID (and therefore also by SalesOrderID,ProductId as the grouping ensures there is only one row per SalesOrderID). The lower branch reads the index in SalesOrderID,ProductId order and they are merge joined together.

Your "bad" case uses a typical execution plan for a row-mode windowed aggregate (with a common subexpression spool). It isn't as bad as the linked video makes out. Reads are reported per row read not per page read for these work tables (so multiplying reads by 8 KB to calculate the data in the spool is certainly invalid) but nonetheless the "bad" case is costed more according to SQL Server's cost model so why does it choose it?

Merge join refresher

A merge join on columns X,Y requires both inputs to the merge join be ordered in a compatible manner. At a minimum both inputs must be ordered at least by the same initial column (i.e. both ordered by X or both ordered by Y). For greatest efficiency (i.e. ideally to avoid overhead of "many to many" merge join) they should generally both be ordered in the same manner for all keys involved in the equi join predicate - i.e. either ORDER BY X,Y or ORDER BY Y,X (the ASC, DESC direction of each key is not important but must be the same in both inputs)

How does it choose between ORDER BY X,Y and ORDER BY Y,X?

This seems to be a general limitation of merge join and composite join predicates. In the absence of any external reason to choose a specified ordering (e.g. an explicit ORDER BY) it will just decide a somewhat arbitrary column order.

For table sources apparently it will look for the first index it comes across that is suitable for providing the required columns in either order and adopt the key column order from that as the order required for the merge join.

The order used for index matching appears to be in reverse order of indexid. This will generally correlate with index creation order but not always (as clustered indexes are reserved an indexid of 1 or a gap in ids may be filled in following a DROP INDEX)

It will not do any analysis as to whether subsequent indexes might be better suited (e.g. as narrower or more compatible with index hints already used in the query).

I'm not 100% certain that this is the code responsible but the call stack below indicates that the implementation rule for sort merge join is looking in table metadata to find out some "Natural Sort"

Another example of adding an index making the plan worse - concentrating on merge join

CREATE TABLE #Test(X INT, Y INT, Filler CHAR(8000),INDEX IX1 UNIQUE /*Index id = 2*/ (X,Y));

INSERT INTO #Test (X,Y) SELECT TOP 10000 ROW_NUMBER() OVER (ORDER BY 1/0), ROW_NUMBER() OVER (ORDER BY 1/0) FROM sys.all_objects o1, sys.all_objects o2;

SET STATISTICS IO ON;

--#1 logical reads 60 uses IX1 - merge join chosen organically
SELECT t1.X, t1.Y
FROM #Test t1 
INNER JOIN #Test t2 ON t1.X = t2.X and t1.Y = t2.Y

SET STATISTICS IO OFF;
CREATE UNIQUE CLUSTERED INDEX ix2 ON #Test(Y,X) 
SET STATISTICS IO ON;

--#2 logical reads 50, Still using IX1 and merge join. The clustered index just created has an id of 1 so lower than 2. 
--IX1 no longer contains a RID so reads a bit lower than previously
SELECT t1.X, t1.Y
FROM #Test t1 
INNER JOIN #Test t2 ON t1.X = t2.X and t1.Y = t2.Y

SET STATISTICS IO OFF;
CREATE UNIQUE INDEX ix3 ON #Test(Y,X) INCLUDE (Filler);
SET STATISTICS IO ON;

--#3 logical reads 20,068 - No longer chooses MERGE join of its own accord and if forced uses more expensive index
SELECT t1.X, t1.Y
FROM #Test t1 
INNER MERGE JOIN #Test t2 ON t1.X = t2.X and t1.Y = t2.Y;

--#4 Back to 50 reads. The merge join is happy to use the order required by the ORDER BY
SELECT t1.X, t1.Y
FROM #Test t1 
INNER MERGE JOIN #Test t2 ON t1.X = t2.X and t1.Y = t2.Y
ORDER BY t1.X, t1.Y;

--#5 50 reads but now uses hash join 
SELECT t1.X, t1.Y
FROM #Test t1 
INNER JOIN #Test t2 ON t1.X = t2.X and t1.Y+0 = t2.Y+0;

--#6 50 reads, Forcing the merge join and looking at properties shows it is now seen as "many to many"
SELECT t1.X, t1.Y
FROM #Test t1 
INNER MERGE JOIN #Test t2 ON t1.X = t2.X and t1.Y+0 = t2.Y+0;

SET STATISTICS IO OFF;

--What if there is no useful index?
DROP INDEX ix3 ON #Test
DROP INDEX ix1 ON #Test
DROP INDEX ix2 ON #Test
CREATE CLUSTERED INDEX ix4 ON #Test(X)

--#7 Sorts are by X, Y 
SELECT t1.X, t1.Y
FROM #Test t1 
INNER MERGE JOIN #Test t2 ON t1.X = t2.X and t1.Y = t2.Y;

--#8 Sorts are by Y, X 
SELECT t1.X, t1.Y
FROM #Test t1 
INNER MERGE JOIN #Test t2 ON  t1.Y = t2.Y AND t1.X = t2.X;

DROP TABLE #Test


  • #1 There is a covering non clustered index on the table with columns (X,Y) - Merge join is chosen naturally (without hints) and uses index ordered scan.



  • #2 Add a clustered index with keys ordered (Y,X). Merge join still chosen naturally and using originally created index. The clustered index will have an id of 1 - which is lower tha

Code Snippets

CREATE TABLE #Test(X INT, Y INT, Filler CHAR(8000),INDEX IX1 UNIQUE /*Index id = 2*/ (X,Y));

INSERT INTO #Test (X,Y) SELECT TOP 10000 ROW_NUMBER() OVER (ORDER BY 1/0), ROW_NUMBER() OVER (ORDER BY 1/0) FROM sys.all_objects o1, sys.all_objects o2;

SET STATISTICS IO ON;

--#1 logical reads 60 uses IX1 - merge join chosen organically
SELECT t1.X, t1.Y
FROM #Test t1 
INNER JOIN #Test t2 ON t1.X = t2.X and t1.Y = t2.Y

SET STATISTICS IO OFF;
CREATE UNIQUE CLUSTERED INDEX ix2 ON #Test(Y,X) 
SET STATISTICS IO ON;

--#2 logical reads 50, Still using IX1 and merge join. The clustered index just created has an id of 1 so lower than 2. 
--IX1 no longer contains a RID so reads a bit lower than previously
SELECT t1.X, t1.Y
FROM #Test t1 
INNER JOIN #Test t2 ON t1.X = t2.X and t1.Y = t2.Y

SET STATISTICS IO OFF;
CREATE UNIQUE INDEX ix3 ON #Test(Y,X) INCLUDE (Filler);
SET STATISTICS IO ON;

--#3 logical reads 20,068 - No longer chooses MERGE join of its own accord and if forced uses more expensive index
SELECT t1.X, t1.Y
FROM #Test t1 
INNER MERGE JOIN #Test t2 ON t1.X = t2.X and t1.Y = t2.Y;

--#4 Back to 50 reads. The merge join is happy to use the order required by the ORDER BY
SELECT t1.X, t1.Y
FROM #Test t1 
INNER MERGE JOIN #Test t2 ON t1.X = t2.X and t1.Y = t2.Y
ORDER BY t1.X, t1.Y;

--#5 50 reads but now uses hash join 
SELECT t1.X, t1.Y
FROM #Test t1 
INNER JOIN #Test t2 ON t1.X = t2.X and t1.Y+0 = t2.Y+0;

--#6 50 reads, Forcing the merge join and looking at properties shows it is now seen as "many to many"
SELECT t1.X, t1.Y
FROM #Test t1 
INNER MERGE JOIN #Test t2 ON t1.X = t2.X and t1.Y+0 = t2.Y+0;

SET STATISTICS IO OFF;

--What if there is no useful index?
DROP INDEX ix3 ON #Test
DROP INDEX ix1 ON #Test
DROP INDEX ix2 ON #Test
CREATE CLUSTERED INDEX ix4 ON #Test(X)

--#7 Sorts are by X, Y 
SELECT t1.X, t1.Y
FROM #Test t1 
INNER MERGE JOIN #Test t2 ON t1.X = t2.X and t1.Y = t2.Y;

--#8 Sorts are by Y, X 
SELECT t1.X, t1.Y
FROM #Test t1 
INNER MERGE JOIN #Test t2 ON  t1.Y = t2.Y AND t1.X = t2.X;


DROP TABLE #Test

Context

StackExchange Database Administrators Q#259780, answer score: 15

Revisions (0)

No revisions yet.