patternsqlModerate
Index Uniqueness Overhead
Viewed 0 times
overheadindexuniqueness
Problem
I've been having an ongoing debate with various developers in my office on the cost of an index, and whether or not uniqueness is beneficial or costly (probably both). The crux of the issue is our competing resources.
Background
I have previously read a discussion that stated a
My coworker combats this statement by saying that
At worst, I have seen tables with an identity column (inherently unique) that is the clustering key of the table, but explicitly stated as non-unique. On the other side of worst is my obsession with uniqueness, and all indexes are created as unique, and when not possible to define an explicitly unique relation to an index, I append the PK of the table to the end of the index to ensure the uniqueness is guaranteed.
I'm frequently involved in code reviews for the dev team, and I need to be able to give general guidelines for them to follow. Yes, every index should be evaluated, but when you have five servers with thousands of tables each and as many as twenty indexes on a table, you need to be able to apply some simple rules to ensure a certain level of quality.
Question
Does uniqueness have an additional cost on the back-end of an
Example Table Definition
```
create table #test_index
(
id int not null identity(1, 1),
dt datetime not null default(current_timestamp),
val varch
Background
I have previously read a discussion that stated a
Unique index is no additional cost to maintain, since an Insert operation implicitly checks for where it fits into the B-tree, and, if a duplicate is found in a non-unique index, appends a uniquifier to the end of the key, but otherwise inserts directly. In this sequence of events, a Unique index has no additional cost.My coworker combats this statement by saying that
Unique is enforced as a second operation after the seek to the new position in the B-tree, and thus is more costly to maintain than a non-unique index.At worst, I have seen tables with an identity column (inherently unique) that is the clustering key of the table, but explicitly stated as non-unique. On the other side of worst is my obsession with uniqueness, and all indexes are created as unique, and when not possible to define an explicitly unique relation to an index, I append the PK of the table to the end of the index to ensure the uniqueness is guaranteed.
I'm frequently involved in code reviews for the dev team, and I need to be able to give general guidelines for them to follow. Yes, every index should be evaluated, but when you have five servers with thousands of tables each and as many as twenty indexes on a table, you need to be able to apply some simple rules to ensure a certain level of quality.
Question
Does uniqueness have an additional cost on the back-end of an
Insert compared to the cost of maintaining a non-unique index? Secondly, what is wrong with appending the Primary Key of a table to the end of an index to ensure uniqueness?Example Table Definition
```
create table #test_index
(
id int not null identity(1, 1),
dt datetime not null default(current_timestamp),
val varch
Solution
I'm frequently involved in code reviews for the dev team, and I need to be able to give general guidelines for them to follow.
The environment I'm currently involved in has 250 servers with 2500 databases. I've worked on systems with 30,000 databases. Guidelines for indexing should revolve around the naming convention, etc, not be "rules" for what columns to include in an index - every individual index should be engineered to be the correct index for that specific business rule or code touching the table.
Does uniqueness have an additional cost on the back-end of an
Adding the primary key column to the end of a non-unique index to make it unique looks to me to be an anti-pattern. If business rules dictate the data should be unique, then add a unique constraint to the column; which will automatically create a unique index. If you're indexing a column for performance, why would you add a column to the index?
Even if your supposition that enforcing uniqueness doesn't add any extra overhead is correct (which it isn't for certain cases), what are you solving by needlessly complicating the index?
In the specific instance of adding the primary key to the end of your index key so that you can make the index definition include the
As David Browne mentioned in a comment:
Since every nonclustered index is stored as unique index, there is no extra cost in inserting into a unique index. In fact the only extra cost would in failing to declare a candidate key as a unique index, which would cause the clustered index keys to be appended to the index keys.
Take the following minimally complete and verifiable example:
I'll add two indexes that are identical except for the addition of the primary key at the tail end of the second indexes key definition:
Next, we'll several rows to the table:
As you can see above, three rows contain the same value for the
Next, we'll look at the physical page structures for each index, using the undocumented
I've looked at the output using Beyond Compare, and except for obvious differences around the allocation page IDs, etc, the two index structures are identical.
You might take the above to mean that including the primary key in every index, and defining at as unique is A Good Thing™ since that's what happens under-the-covers anyway. I wouldn't make that assumption, and would suggest only defining an index as unique if in fact the natural data in the index is unique already.
There are several excellent resources in the Interwebz about this topic, including:
The environment I'm currently involved in has 250 servers with 2500 databases. I've worked on systems with 30,000 databases. Guidelines for indexing should revolve around the naming convention, etc, not be "rules" for what columns to include in an index - every individual index should be engineered to be the correct index for that specific business rule or code touching the table.
Does uniqueness have an additional cost on the back-end of an
Insert compared to the cost of maintaining a non-unique index? Secondly, what is wrong with appending the Primary Key of a table to the end of an index to ensure uniqueness?Adding the primary key column to the end of a non-unique index to make it unique looks to me to be an anti-pattern. If business rules dictate the data should be unique, then add a unique constraint to the column; which will automatically create a unique index. If you're indexing a column for performance, why would you add a column to the index?
Even if your supposition that enforcing uniqueness doesn't add any extra overhead is correct (which it isn't for certain cases), what are you solving by needlessly complicating the index?
In the specific instance of adding the primary key to the end of your index key so that you can make the index definition include the
UNIQUE modifier, it actually makes zero difference to the physical index structure on disk. This is due to the nature of the structure of B-tree indexes keys, in that they always need to be unique.As David Browne mentioned in a comment:
Since every nonclustered index is stored as unique index, there is no extra cost in inserting into a unique index. In fact the only extra cost would in failing to declare a candidate key as a unique index, which would cause the clustered index keys to be appended to the index keys.
Take the following minimally complete and verifiable example:
USE tempdb;
DROP TABLE IF EXISTS dbo.IndexTest;
CREATE TABLE dbo.IndexTest
(
id int NOT NULL
CONSTRAINT IndexTest_pk
PRIMARY KEY
CLUSTERED
IDENTITY(1,1)
, rowDate datetime NOT NULL
);I'll add two indexes that are identical except for the addition of the primary key at the tail end of the second indexes key definition:
CREATE INDEX IndexTest_rowDate_ix01
ON dbo.IndexTest(rowDate);
CREATE UNIQUE INDEX IndexTest_rowDate_ix02
ON dbo.IndexTest(rowDate, id);Next, we'll several rows to the table:
INSERT INTO dbo.IndexTest (rowDate)
VALUES (DATEADD(SECOND, 0, GETDATE()))
, (DATEADD(SECOND, 0, GETDATE()))
, (DATEADD(SECOND, 0, GETDATE()))
, (DATEADD(SECOND, 1, GETDATE()))
, (DATEADD(SECOND, 2, GETDATE()));As you can see above, three rows contain the same value for the
rowDate column, and two rows contain unique values.Next, we'll look at the physical page structures for each index, using the undocumented
DBCC PAGE command:DECLARE @dbid int = DB_ID();
DECLARE @fileid int;
DECLARE @pageid int;
DECLARE @indexid int;
SELECT @fileid = ddpa.allocated_page_file_id
, @pageid = ddpa.allocated_page_page_id
FROM sys.indexes i
CROSS APPLY sys.dm_db_database_page_allocations(DB_ID(), i.object_id, i.index_id, NULL, 'LIMITED') ddpa
WHERE i.name = N'IndexTest_rowDate_ix01'
AND ddpa.is_allocated = 1
AND ddpa.is_iam_page = 0;
PRINT N'*************************************** IndexTest_rowDate_ix01 *****************************************';
DBCC TRACEON(3604);
DBCC PAGE (@dbid, @fileid, @pageid, 1);
DBCC TRACEON(3604);
PRINT N'*************************************** IndexTest_rowDate_ix01 *****************************************';
SELECT @fileid = ddpa.allocated_page_file_id
, @pageid = ddpa.allocated_page_page_id
FROM sys.indexes i
CROSS APPLY sys.dm_db_database_page_allocations(DB_ID(), i.object_id, i.index_id, NULL, 'LIMITED') ddpa
WHERE i.name = N'IndexTest_rowDate_ix02'
AND ddpa.is_allocated = 1
AND ddpa.is_iam_page = 0;
PRINT N'*************************************** IndexTest_rowDate_ix02 *****************************************';
DBCC TRACEON(3604);
DBCC PAGE (@dbid, @fileid, @pageid, 1);
DBCC TRACEON(3604);
PRINT N'*************************************** IndexTest_rowDate_ix02 *****************************************';I've looked at the output using Beyond Compare, and except for obvious differences around the allocation page IDs, etc, the two index structures are identical.
You might take the above to mean that including the primary key in every index, and defining at as unique is A Good Thing™ since that's what happens under-the-covers anyway. I wouldn't make that assumption, and would suggest only defining an index as unique if in fact the natural data in the index is unique already.
There are several excellent resources in the Interwebz about this topic, including:
- Where Clustered
Code Snippets
USE tempdb;
DROP TABLE IF EXISTS dbo.IndexTest;
CREATE TABLE dbo.IndexTest
(
id int NOT NULL
CONSTRAINT IndexTest_pk
PRIMARY KEY
CLUSTERED
IDENTITY(1,1)
, rowDate datetime NOT NULL
);CREATE INDEX IndexTest_rowDate_ix01
ON dbo.IndexTest(rowDate);
CREATE UNIQUE INDEX IndexTest_rowDate_ix02
ON dbo.IndexTest(rowDate, id);INSERT INTO dbo.IndexTest (rowDate)
VALUES (DATEADD(SECOND, 0, GETDATE()))
, (DATEADD(SECOND, 0, GETDATE()))
, (DATEADD(SECOND, 0, GETDATE()))
, (DATEADD(SECOND, 1, GETDATE()))
, (DATEADD(SECOND, 2, GETDATE()));DECLARE @dbid int = DB_ID();
DECLARE @fileid int;
DECLARE @pageid int;
DECLARE @indexid int;
SELECT @fileid = ddpa.allocated_page_file_id
, @pageid = ddpa.allocated_page_page_id
FROM sys.indexes i
CROSS APPLY sys.dm_db_database_page_allocations(DB_ID(), i.object_id, i.index_id, NULL, 'LIMITED') ddpa
WHERE i.name = N'IndexTest_rowDate_ix01'
AND ddpa.is_allocated = 1
AND ddpa.is_iam_page = 0;
PRINT N'*************************************** IndexTest_rowDate_ix01 *****************************************';
DBCC TRACEON(3604);
DBCC PAGE (@dbid, @fileid, @pageid, 1);
DBCC TRACEON(3604);
PRINT N'*************************************** IndexTest_rowDate_ix01 *****************************************';
SELECT @fileid = ddpa.allocated_page_file_id
, @pageid = ddpa.allocated_page_page_id
FROM sys.indexes i
CROSS APPLY sys.dm_db_database_page_allocations(DB_ID(), i.object_id, i.index_id, NULL, 'LIMITED') ddpa
WHERE i.name = N'IndexTest_rowDate_ix02'
AND ddpa.is_allocated = 1
AND ddpa.is_iam_page = 0;
PRINT N'*************************************** IndexTest_rowDate_ix02 *****************************************';
DBCC TRACEON(3604);
DBCC PAGE (@dbid, @fileid, @pageid, 1);
DBCC TRACEON(3604);
PRINT N'*************************************** IndexTest_rowDate_ix02 *****************************************';Context
StackExchange Database Administrators Q#242908, answer score: 16
Revisions (0)
No revisions yet.