HiveBrain v1.2.0
Get Started
← Back to all entries
gotchasqlMinor

Why does sql server prefer the nonclustered index over the clustered index?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
clusteredwhytheprefersqlnonclustereddoesserverindexover

Problem

I am trying to speed up a table and as I was experimenting I ran into this (what I think is) odd occurrence. I created a clustered index and a nonclustered index that should be the same thing. However, as I have run queries against the table I have found that SQL Server always wants to use the nonclustered index instead of the matching clustered index. On top of that, when needed SQL Server will properly do an index seek on the nonclustered index, but will always perform a scan on the clustered index.

Why does SQL Server prefer the nonclustered index?

And how can I rewrite this so I still have the performance increase but only the clustered index?

I have the following table structure:

CREATE TABLE [dbo].[Variables](
    [ID] [bigint] IDENTITY(1,1) NOT NULL,
    [Header] [varchar](255) NULL,
    [FullVariables] [varchar](max) NULL
)


Clustered index:

ALTER TABLE [dbo].[Variables] ADD  CONSTRAINT [PK_Variables] PRIMARY KEY CLUSTERED 
(
    [ID] ASC
)


Nonclustered index:

CREATE UNIQUE NONCLUSTERED INDEX [NonClusteredIndex-20190307-091011] ON [dbo].[Variables]
(
    [ID] ASC
)
INCLUDE (   [Header],
    [FullVariables])


My current knowledge leads me to believe that in this case both of those indexes should contain the data laid out in the same fashion with [ID] being the key column and then [Header] and [FullVariables] as extra data contained on the index instead of being pointers. If you have some source of knowledge that you could link I am more than eager to read more.

I should specify that I don't always want a seek and I understand that a scan is better in some cases (otherwise why would be have it). The table contains about 60GB of data due to row size (several million) multiplied by the varchar(MAX) (which contains strings that are 16000+ characters long). Before inserting into the table, a scan is done to ensure no duplicates are insterted (matching on Header for elimination and on FullVariables). Then the table

Solution

If SQL Server has two indexes to choose from, both of which satisfy ("cover") the query and provide the best possible path to locating and/or sorting the rows, you should consider it to be a coin flip. It's not, though... I believe there was some research done here (maybe by me, here and here) that showed it picked the most recent one created or first one alphabetically or something that is otherwise arbitrary.

However, if the coin flip as we'll call it involves the choice between a non-clustered index and a clustered index, and again both indexes properly satisfy the query, SQL Server will always lean toward the non-clustered. Why? Because it's guaranteed to be no wider than the clustered index. The edge case where it is exactly the same width as the clustered index is not a consideration.

You should look at the costs involved with each execution plan, and confirm that the costs SQL Server estimates for the non-clustered index are <= those for the clustered index. If you can show a counter-example, where the non-clustered index is chosen even though its estimated costs are higher than the clustered, please do.

Context

StackExchange Database Administrators Q#235053, answer score: 6

Revisions (0)

No revisions yet.