HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

How Does SQL Server Choose Indexes to Use

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
sqlchooseindexesdoeshowserveruse

Problem

I've recently begun working on indexing the various views in our DB. Often, some of these tables are only used for joins, and rarely are they used in where or order by statements. What this means, is that we are creating multiple indexes (or one large relatively less efficient index) which are all keyed on the primary key. I should clarify that there are no where or order by containing anything from this table and there are no key-lookups in the views I am having issues with.

If SQL has multiple indexes that are all keyed on tbl.ID, how does it make the choice to use one index over the other? I find that it can vary between views even if the data pulled from the index will be the same. Generally, there is not a huge gain/loss in efficiency in choosing one index over the other; leading me to believe that it chooses a 'good enough' index and moves on (since there are many keyed off of the same column).

Further, should creating multiple indexes with the same key be avoided for this reason?

The specific problem i am currently dealing with has the view choosing an index like this:

CREATE NONCLUSTERED INDEX [index1]
ON a.tbl ([ID])
INCLUDE (number,name,year, ...)


Where there are 14 columns in the include (it's unfortunate, but required based on our current query structure)

Over the more efficient index:

CREATE NONCLUSTERED INDEX [index2]
ON a.tbl ([ID])
INCLUDE (number,name,year, ...)


Where there are only 5 columns in the include.

Both indexes have the information that the view needs, but one is obviously much smaller and more efficient to use. The larger index needs to exist to cover a massive view that calls all of the columns in it (roughly half of the table).

It is probably worth mentioning that I could choose to create one massive index as above, which covers almost every view slightly less efficiently than these smaller indexes do. From my testing, it seems that adding these additional indexes does not add a significant cost to updating th

Solution

In my limited experimentation:

  • When the query has to perform a scan, SQL Server does the work to find the narrowest index every time.



  • When the query is expected to perform a seek, it doesn't care about index width, it just uses the last covering index created.



This kind of makes sense. If you're going to add rules to search for the perfect index in either of these cases, you should do it when it's going to save you the most. On a seek the effort spent in finding a better index when it won't really matter in terms of savings for all rows as opposed to per row is far less likely to be worth it. "Last in" is maybe intentional, with the assumption that the last covering index you created is probably the best one you've created, or it may just be arbitrary and coincidental (like the order of columns in a missing index recommendation).

Proof (well, sort of proof)

My test setup was pretty simple:

CREATE TABLE dbo.t1
(
  id int IDENTITY(1,1) PRIMARY KEY,
  sn1 sysname, tn1 sysname, cn1 sysname, typ1 sysname,
  sn2 sysname, tn2 sysname, cn2 sysname, typ2 sysname,
  sn3 sysname, tn3 sysname, cn3 sysname, typ3 sysname,
  sn4 sysname, tn4 sysname, cn4 sysname, typ4 sysname
);
GO
SET NOCOUNT ON;
GO
INSERT dbo.t1
(
  sn1,tn1,cn1,typ1,sn2,tn2,cn2,typ2,
  sn3,tn3,cn3,typ3,sn4,tn4,cn4,typ4
)
SELECT s.name, t.name, c.name, typ.name,
       s.name, t.name, c.name, typ.name,
       s.name, t.name, c.name, typ.name,
       s.name, t.name, c.name, typ.name
FROM sys.schemas AS s
CROSS JOIN sys.objects AS t
INNER JOIN sys.all_columns AS c
ON t.[object_id] = c.[object_id]
INNER JOIN sys.types AS typ
ON c.user_type_id = typ.user_type_id;
GO

CREATE VIEW dbo.v1
AS
  SELECT id,sn1,tn1
  FROM dbo.t1;
GO


This put 13,260 rows into my table (your results will vary). Then I repeatedly created the same three indexes, in different order:

-- widest first
CREATE INDEX ix_wide ON dbo.t1(sn1) 
  INCLUDE(tn1,cn1,typ1,sn2,tn2,cn2,typ2,sn3,tn3,cn3,typ3,sn4,tn4,cn4,typ4);
GO
CREATE INDEX ix_mid ON dbo.t1(sn1) INCLUDE(tn1,cn1,sn2,tn2,cn2,typ2,sn3,tn3);
GO
CREATE INDEX ix_small ON dbo.t1(sn1) INCLUDE(tn1,cn1);
GO

-- widest last = ix_small then ix_mid then ix_wide
-- middle1 = ix_mid then ix_wide then ix_small
-- middle2 = ix_small then ix_wide then ix_mid


Then in each of those four cases I ran these two queries and investigated the plans:

DBCC FREEPROCCACHE;
GO
SELECT id,sn1,tn1 FROM dbo.v1; -- scan
GO
SELECT id,sn1,tn1 FROM dbo.v1 WHERE sn1 LIKE N'q%'; -- seek


Results:

widest first  widest last  middle1   middle2
-------  ------------  -----------  --------  --------
   scan    ix_small      ix_small   ix_small  ix_small
   seek    ix_small      ix_wide    ix_small  ix_mid


The scan always chose the narrowest index. The seek always chose the index that was created last (since all cover). I didn't extend the tests to also include a non-covering index in the mix, because I don't suspect it will change this outcome.

Morals (well, sort of morals)

There are two morals here:

  • Create your widest indexes first if you want your narrowest indexes to be used in seeks when they can.



  • It's probably a narrow set of use cases where this optimization is worth it. Such as seeks that return a lot of rows and the wider covering index has very large columns that aren't in the narrower index. In these cases it might be worth explicitly specifying the index with a hint (and all the caveats that involves) rather than relying on the behavior I've observed here.



Probably some internals and heuristics I'm missing here that Paul will clarify. Also I tried to enable trace flag 302 to output index selection information, but this doesn't seem to work on modern versions of SQL Server.

Code Snippets

CREATE TABLE dbo.t1
(
  id int IDENTITY(1,1) PRIMARY KEY,
  sn1 sysname, tn1 sysname, cn1 sysname, typ1 sysname,
  sn2 sysname, tn2 sysname, cn2 sysname, typ2 sysname,
  sn3 sysname, tn3 sysname, cn3 sysname, typ3 sysname,
  sn4 sysname, tn4 sysname, cn4 sysname, typ4 sysname
);
GO
SET NOCOUNT ON;
GO
INSERT dbo.t1
(
  sn1,tn1,cn1,typ1,sn2,tn2,cn2,typ2,
  sn3,tn3,cn3,typ3,sn4,tn4,cn4,typ4
)
SELECT s.name, t.name, c.name, typ.name,
       s.name, t.name, c.name, typ.name,
       s.name, t.name, c.name, typ.name,
       s.name, t.name, c.name, typ.name
FROM sys.schemas AS s
CROSS JOIN sys.objects AS t
INNER JOIN sys.all_columns AS c
ON t.[object_id] = c.[object_id]
INNER JOIN sys.types AS typ
ON c.user_type_id = typ.user_type_id;
GO

CREATE VIEW dbo.v1
AS
  SELECT id,sn1,tn1
  FROM dbo.t1;
GO
-- widest first
CREATE INDEX ix_wide ON dbo.t1(sn1) 
  INCLUDE(tn1,cn1,typ1,sn2,tn2,cn2,typ2,sn3,tn3,cn3,typ3,sn4,tn4,cn4,typ4);
GO
CREATE INDEX ix_mid ON dbo.t1(sn1) INCLUDE(tn1,cn1,sn2,tn2,cn2,typ2,sn3,tn3);
GO
CREATE INDEX ix_small ON dbo.t1(sn1) INCLUDE(tn1,cn1);
GO

-- widest last = ix_small then ix_mid then ix_wide
-- middle1 = ix_mid then ix_wide then ix_small
-- middle2 = ix_small then ix_wide then ix_mid
DBCC FREEPROCCACHE;
GO
SELECT id,sn1,tn1 FROM dbo.v1; -- scan
GO
SELECT id,sn1,tn1 FROM dbo.v1 WHERE sn1 LIKE N'q%'; -- seek
widest first  widest last  middle1   middle2
-------  ------------  -----------  --------  --------
   scan    ix_small      ix_small   ix_small  ix_small
   seek    ix_small      ix_wide    ix_small  ix_mid

Context

StackExchange Database Administrators Q#210044, answer score: 9

Revisions (0)

No revisions yet.