patternsqlModerate
Missing Non Clustered Index already part of Clustered Index
Viewed 0 times
clusterednonpartalreadymissingindex
Problem
I'm debugging a slow running query and in the execution plan a non-clustered index is suggested, with 51.6648 Impact. However, the non-clustered index only includes columns that are already in the Primary Key (PK) Composite Clustered Index.
Could this be because of the order of the columns in the index? i.e. if the columns in the clustered index are not in order from most selective to least then is there potential for a non-clustered index to improve performance?
In addition the non-clustered index only contains two of the three PK columns with the third added as an included column. Is the
Below is an example of the table structures I am working with:
Tables-
The table
Could this be because of the order of the columns in the index? i.e. if the columns in the clustered index are not in order from most selective to least then is there potential for a non-clustered index to improve performance?
In addition the non-clustered index only contains two of the three PK columns with the third added as an included column. Is the
include another reason why use of the non-clustered index could be more optimal?Below is an example of the table structures I am working with:
Tables-
Retailers (
RetailerID int PK,
name ...)
Retailer_Relation_Types (
RelationType smallint PK,
Description nvarchar(50) ...)
Retailer_Relations (
RetailerID int PK FK,
RelatedRetailerID int PK FK,
RelationType smallint PK FK,
CreatedOn datetime ...)The table
Retailer_Relations has the following composite PK index and suggested index-CONSTRAINT PK_Retailer_Relations
PRIMARY KEY CLUSTERED (
RetailerID ASC,
RelatedRetailerID ASC,
RelationType ASC
) ON [PRIMARY]
CREATE NONCLUSTERED INDEX
ON Retailer_Relations (
RetailerID,
RelationType
)
INCLUDE (
RelatedRetailerID
)Solution
The table Retailer_Relations has the following composite PK index and
suggested index-
While missing indexes could be helpful and could definitely work, I would not spend too much time on missing indexes, these hints are created on the estimated execution plan, not on the actual execution plan.
More precisely, these index hints are based on the premise of reducing the cost of Query Bucks™ used by operators in the plan. The optimizer calculates the estimated costs, and adds missing index hints accordingly.
As a result they could be very wrong. If you are unsure if it is going to help, the best thing to do is test the situation before and after.
You could do this by adding the statement
Also, you could use statisticsparser to make it easier to read these statistics.
Could this be because of the order of the columns in the index?
That is correct, creating the missing index can improve the selectivity on queries, for example if your query looks like this:
or like this:
The reasoning behind this is that both indexes could seek on RetailerID, that part is not going to change.
But what if extra filters/ordering is applied on RelationType?
It would be all over the place in the clustered index, as a result of it being the third key value, not the second key value. And as we know, it is the second key value in the NCI.
Okay, but when or how would the nonclustered index improve the query?
A couple of cases could be:
high, resulting in the possible need of the nonclustered index (Query #1)
large (Query #2).
NCI Side note
As a side note, adding the key columns to the include list in your NCI is not exactly needed, since CI key columns are automatically included in all Non clustered indexes.
You could opt to do so if you are not sure if the clustered index will remain the same, and want the column to always be included.
Regarding the query itself, if you added the execution plan via PasteThePlan we could give some more information on indexing / improving the query.
Testing
Create table and add some rows
Query #1
Plan without index Here
While it is doing a seek, it is doing a seek on RetailerID. Afterwards it is issueing a residual I/O predicate on RelationType
Add the index
The residual predicate is gone, everything happens in a seek predicate, on both columns.
Execution plan
With the second query, the added index helpfullness becomes even more obvious:
Plan without the index, with a Sort operator:
Plan with the index, using the index removes the sort operator
suggested index-
While missing indexes could be helpful and could definitely work, I would not spend too much time on missing indexes, these hints are created on the estimated execution plan, not on the actual execution plan.
More precisely, these index hints are based on the premise of reducing the cost of Query Bucks™ used by operators in the plan. The optimizer calculates the estimated costs, and adds missing index hints accordingly.
As a result they could be very wrong. If you are unsure if it is going to help, the best thing to do is test the situation before and after.
You could do this by adding the statement
SET STATISTICS IO, TIME ON; before running the query.Also, you could use statisticsparser to make it easier to read these statistics.
Could this be because of the order of the columns in the index?
That is correct, creating the missing index can improve the selectivity on queries, for example if your query looks like this:
SELECT RelatedRetailerID
FROM Retailer_Relations
WHERE
RetailerID = 5 AND
RelationType = 20;or like this:
SELECT RelatedRetailerID
FROM Retailer_Relations
ORDER BY
RetailerID,
RelationType;The reasoning behind this is that both indexes could seek on RetailerID, that part is not going to change.
But what if extra filters/ordering is applied on RelationType?
It would be all over the place in the clustered index, as a result of it being the third key value, not the second key value. And as we know, it is the second key value in the NCI.
Okay, but when or how would the nonclustered index improve the query?
A couple of cases could be:
- If relationType filters a lot of values, the residual I/O could be
high, resulting in the possible need of the nonclustered index (Query #1)
- Ordering on the two columns occurs (One way), and the resultset is
large (Query #2).
- As @AaronBertrand mentioned: if the CI size difference compared to the NCI is of a considerable amount, adding the NCI will reduce the pages read by queries that benefit from it.
NCI Side note
As a side note, adding the key columns to the include list in your NCI is not exactly needed, since CI key columns are automatically included in all Non clustered indexes.
You could opt to do so if you are not sure if the clustered index will remain the same, and want the column to always be included.
Regarding the query itself, if you added the execution plan via PasteThePlan we could give some more information on indexing / improving the query.
Testing
Create table and add some rows
CREATE TABLE Retailer_Relations (
RetailerID int ,
RelatedRetailerID int ,
RelationType smallint,
CreatedOn datetime,
CONSTRAINT PK_Retailer_Relations
PRIMARY KEY CLUSTERED (
RetailerID ASC,
RelatedRetailerID ASC,
RelationType ASC
) ON [PRIMARY])
DECLARE @I Int = 1
WHILE @I < 1000
BEGIN
INSERT INTO Retailer_Relations(RetailerID,RelatedRetailerID,RelationType,CreatedOn)
VALUES(@I,@I,@I,GETDATE()
)
set @I += 1
ENDQuery #1
SELECT RelatedRetailerID
FROM Retailer_Relations
WHERE
RetailerID = 5 AND
RelationType = 20;Plan without index Here
While it is doing a seek, it is doing a seek on RetailerID. Afterwards it is issueing a residual I/O predicate on RelationType
Add the index
CREATE NONCLUSTERED INDEX IX_TEST
ON Retailer_Relations (
RetailerID,
RelationType
)
INCLUDE (
RelatedRetailerID
)The residual predicate is gone, everything happens in a seek predicate, on both columns.
Execution plan
With the second query, the added index helpfullness becomes even more obvious:
SELECT RelatedRetailerID
FROM Retailer_Relations
ORDER BY
RetailerID,
RelationType;Plan without the index, with a Sort operator:
Plan with the index, using the index removes the sort operator
Code Snippets
SELECT RelatedRetailerID
FROM Retailer_Relations
WHERE
RetailerID = 5 AND
RelationType = 20;SELECT RelatedRetailerID
FROM Retailer_Relations
ORDER BY
RetailerID,
RelationType;CREATE TABLE Retailer_Relations (
RetailerID int ,
RelatedRetailerID int ,
RelationType smallint,
CreatedOn datetime,
CONSTRAINT PK_Retailer_Relations
PRIMARY KEY CLUSTERED (
RetailerID ASC,
RelatedRetailerID ASC,
RelationType ASC
) ON [PRIMARY])
DECLARE @I Int = 1
WHILE @I < 1000
BEGIN
INSERT INTO Retailer_Relations(RetailerID,RelatedRetailerID,RelationType,CreatedOn)
VALUES(@I,@I,@I,GETDATE()
)
set @I += 1
ENDSELECT RelatedRetailerID
FROM Retailer_Relations
WHERE
RetailerID = 5 AND
RelationType = 20;CREATE NONCLUSTERED INDEX IX_TEST
ON Retailer_Relations (
RetailerID,
RelationType
)
INCLUDE (
RelatedRetailerID
)Context
StackExchange Database Administrators Q#228073, answer score: 12
Revisions (0)
No revisions yet.