gotchasqlMajor
Performance difference between Clustered and Non Clustered Index
Viewed 0 times
clusterednondifferencebetweenperformanceandindex
Problem
I was reading
information will be present in the Clustered Index Column.
form of Clustered Index column(if availabe) or the File Indentifier +
Page Number + Total Rows in a Page. This means that the query engine must take an additional step in order to locate the actual data.
Query - How can I check the performance difference with a help of a practical example as we know that the table can have only one
Clustered and Non Clustered Indexes.Clustered Index - It contains Data Pages. That means the complete rowinformation will be present in the Clustered Index Column.
Non Clustered Index - It only contains the Row Locator information in theform of Clustered Index column(if availabe) or the File Indentifier +
Page Number + Total Rows in a Page. This means that the query engine must take an additional step in order to locate the actual data.
Query - How can I check the performance difference with a help of a practical example as we know that the table can have only one
Clustered Index and provides sorting at the Clustered Index Column and Non Clustered Index don't provide sorting and can support 999 Non Clustered Indexes in SQL Server 2008 and 249 in SQL Server 2005.Solution
Very good question as it is such a important concept. This is a big topic though and what I am going to show you is a simplification so you can understand the base concepts.
Firstly when you see clustered index think table. In SQL server if a table does not contain a clustered index it is a heap. Creating a clustered index on the table actually transforms the table into a b-tree type structure. Your clustered index IS your table it is not separate from the table
Ever wondered why you can only have one clustered index? Well if we had two clustered indexes we would need two copies of the table. It contains the data after all.
I am going to try and explain this by using a simple example.
NOTE: I created the table in this example and filled it with over 3 million random entries. Then ran the actual queries and pasted the execution plans here.
What you really need to grasp is O notation or operational efficiency. Let's assume you have the following table.
So here we have basic table with a clustered key on CustomerID (Primary key is clustered by default). Thus the table is arranged/ordered based on the primary key CustomerID. The intermediate levels will contain the CustomerID values. The data pages will contain the whole row thus it is the table row.
We will also create a non-clustered index on the CustomerName field. The following code will do it.
So in this index you would find on the data pages/leaf level nodes a pointer to the intermediate levels in the clustered index. The index is arranged/ordered around the CustomerName field. Thus the intermediate level contains the CustomerName values and the leaf level will contain the pointer (these pointer values are actually the primary key values or CustomerID column).
Right so if we execute the following query:
SQL will probably read the clustered index via a seek operation. A seek operation is a binary search which is much more efficient than a scan which is a sequential search. So in our above example the index is read and by using a binary search SQL can eliminate the data that don't match the criteria we are looking for. See attached screen shot for the query plan.
So the number of operations or O Notation for the seek operation is as follows:
So it is two operations. However if we executed the following query:
SQL will now use the non-clustered index on the CustomerName to do the search. However since this is a non-clustered index it does not contain the all of the data in the row.
So SQL will do the search on the intermediate levels to find the records that match then do a lookup using the values returned to do another search on the clustered index(aka the table) to retrieve the actual data. This sounds confusing I know but read on and all will become clear.
Since our non-clustered index only contains the CustomerName field(the indexed field values stored in the intermediate nodes) and the pointer to the data which is the CustomerID, the index has no record of the CustomerSurname. The CustomerSurname has to be fetched from the clustered index or table.
When running this query I get the following execution plan:
There are two important things for you to notice in the screen shot above
Why is SQL suggesting the index on CustomerName again? Well since the index contains only the CustomerID and the CustomerName SQL still has to find the CustomerSurname from the table/clustered indexes.
If we created the index and we included the CustomerSurname column in the index SQL would be able to satisfy the entire query by just reading the non-clustered index. This is why SQL is suggesting I change my
Firstly when you see clustered index think table. In SQL server if a table does not contain a clustered index it is a heap. Creating a clustered index on the table actually transforms the table into a b-tree type structure. Your clustered index IS your table it is not separate from the table
Ever wondered why you can only have one clustered index? Well if we had two clustered indexes we would need two copies of the table. It contains the data after all.
I am going to try and explain this by using a simple example.
NOTE: I created the table in this example and filled it with over 3 million random entries. Then ran the actual queries and pasted the execution plans here.
What you really need to grasp is O notation or operational efficiency. Let's assume you have the following table.
CREATE TABLE [dbo].[Customer](
[CustomerID] [int] IDENTITY(1,1) NOT NULL,
[CustomerName] [varchar](100) NOT NULL,
[CustomerSurname] [varchar](100) NOT NULL,
CONSTRAINT [PK_Customer] PRIMARY KEY CLUSTERED
(
[CustomerID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF
, IGNORE_DUP_KEY = OFF,ALLOW_ROW_LOCKS = ON
, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]So here we have basic table with a clustered key on CustomerID (Primary key is clustered by default). Thus the table is arranged/ordered based on the primary key CustomerID. The intermediate levels will contain the CustomerID values. The data pages will contain the whole row thus it is the table row.
We will also create a non-clustered index on the CustomerName field. The following code will do it.
CREATE NONCLUSTERED INDEX [ix_Customer_CustomerName] ON [dbo].[Customer]
(
[CustomerName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF
, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF
, DROP_EXISTING = OFF, ONLINE = OFF
, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]So in this index you would find on the data pages/leaf level nodes a pointer to the intermediate levels in the clustered index. The index is arranged/ordered around the CustomerName field. Thus the intermediate level contains the CustomerName values and the leaf level will contain the pointer (these pointer values are actually the primary key values or CustomerID column).
Right so if we execute the following query:
SELECT * FROM Customer WHERE CustomerID = 1SQL will probably read the clustered index via a seek operation. A seek operation is a binary search which is much more efficient than a scan which is a sequential search. So in our above example the index is read and by using a binary search SQL can eliminate the data that don't match the criteria we are looking for. See attached screen shot for the query plan.
So the number of operations or O Notation for the seek operation is as follows:
- Do binary search on clustered index by comparing the value searched for to the values in the intermediate level.
- Return the values that match( remember since the clustered index has all the data in it can return all the columns from the index as it is the row data)
So it is two operations. However if we executed the following query:
SELECT * FROM Customer WHERE CustomerName ='John'SQL will now use the non-clustered index on the CustomerName to do the search. However since this is a non-clustered index it does not contain the all of the data in the row.
So SQL will do the search on the intermediate levels to find the records that match then do a lookup using the values returned to do another search on the clustered index(aka the table) to retrieve the actual data. This sounds confusing I know but read on and all will become clear.
Since our non-clustered index only contains the CustomerName field(the indexed field values stored in the intermediate nodes) and the pointer to the data which is the CustomerID, the index has no record of the CustomerSurname. The CustomerSurname has to be fetched from the clustered index or table.
When running this query I get the following execution plan:
There are two important things for you to notice in the screen shot above
- SQL is saying I have a missing index(the text in green). SQL is suggesting I create a index on CustomerName which includes CustomerID and CustomerSurname.
- You will also see that 99% of the time of the query is spent on doing a key lookup on the primary key index/clustered index.
Why is SQL suggesting the index on CustomerName again? Well since the index contains only the CustomerID and the CustomerName SQL still has to find the CustomerSurname from the table/clustered indexes.
If we created the index and we included the CustomerSurname column in the index SQL would be able to satisfy the entire query by just reading the non-clustered index. This is why SQL is suggesting I change my
Code Snippets
CREATE TABLE [dbo].[Customer](
[CustomerID] [int] IDENTITY(1,1) NOT NULL,
[CustomerName] [varchar](100) NOT NULL,
[CustomerSurname] [varchar](100) NOT NULL,
CONSTRAINT [PK_Customer] PRIMARY KEY CLUSTERED
(
[CustomerID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF
, IGNORE_DUP_KEY = OFF,ALLOW_ROW_LOCKS = ON
, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]CREATE NONCLUSTERED INDEX [ix_Customer_CustomerName] ON [dbo].[Customer]
(
[CustomerName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF
, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF
, DROP_EXISTING = OFF, ONLINE = OFF
, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]SELECT * FROM Customer WHERE CustomerID = 1SELECT * FROM Customer WHERE CustomerName ='John'SELECT CustomerID
FROM Customer
WHERE CustomerName='Jane'Context
StackExchange Database Administrators Q#18528, answer score: 44
Revisions (0)
No revisions yet.