patternsqlMajor
Do natural keys provide higher or lower performance in SQL Server than surrogate integer keys?
Viewed 0 times
providelowersqlthankeyshighernaturalperformanceserversurrogate
Problem
I'm a fan of surrogate keys. There is a risk my findings are confirmation biased.
Many questions I've seen both here and at http://stackoverflow.com use natural keys instead of surrogate keys based on
My background in computer systems tells me performing any comparative operation on an integer will be faster than comparing strings.
This comment made me question my beliefs, so I thought I would create a system to investigate my thesis that integers are faster than strings for use as keys in SQL Server.
Since there is likely to be very little discernible difference in small datasets, I immediately thought of a two table setup where the primary table has 1,000,000 rows and the secondary table has 10 rows for each row in the primary table for a total of 10,000,000 rows in the secondary table. The premise of my test is to create two sets of tables like this, one using natural keys and one using integer keys, and run timing tests on a simple query like:
The following is the code I created as a test bed:
```
USE Master;
IF (SELECT COUNT(database_id) FROM sys.databases d WHERE d.name = 'NaturalKeyTest') = 1
BEGIN
ALTER DATABASE NaturalKeyTest SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
DROP DATABASE NaturalKeyTest;
END
GO
CREATE DATABASE NaturalKeyTest
ON (NAME = 'NaturalKeyTest', FILENAME =
'C:\SQLServer\Data\NaturalKeyTest.mdf', SIZE=8GB, FILEGROWTH=1GB)
LOG ON (NAME='NaturalKeyTestLog', FILENAME =
'C:\SQLServer\Logs\NaturalKeyTest.mdf', SIZE=256MB, FILEGROWTH=128MB);
GO
ALTER DATABASE NaturalKeyTest SET RECOVERY SIMPLE;
GO
USE NaturalKeyTest;
GO
CREATE VIEW GetRand
AS
SELECT RAND() AS RandomNumber;
GO
CREATE FUNCTION RandomString
(
@StringLength INT
)
RETURNS NVARCHAR(max)
AS
BEGIN
DECLARE @cnt INT = 0
DECLARE @str NVARCHAR(MAX) = '';
DECLARE @RandomNum FLOAT = 0;
WHILE @cnt < @StringLength
BEGIN
Many questions I've seen both here and at http://stackoverflow.com use natural keys instead of surrogate keys based on
IDENTITY() values.My background in computer systems tells me performing any comparative operation on an integer will be faster than comparing strings.
This comment made me question my beliefs, so I thought I would create a system to investigate my thesis that integers are faster than strings for use as keys in SQL Server.
Since there is likely to be very little discernible difference in small datasets, I immediately thought of a two table setup where the primary table has 1,000,000 rows and the secondary table has 10 rows for each row in the primary table for a total of 10,000,000 rows in the secondary table. The premise of my test is to create two sets of tables like this, one using natural keys and one using integer keys, and run timing tests on a simple query like:
SELECT *
FROM Table1
INNER JOIN Table2 ON Table1.Key = Table2.Key;The following is the code I created as a test bed:
```
USE Master;
IF (SELECT COUNT(database_id) FROM sys.databases d WHERE d.name = 'NaturalKeyTest') = 1
BEGIN
ALTER DATABASE NaturalKeyTest SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
DROP DATABASE NaturalKeyTest;
END
GO
CREATE DATABASE NaturalKeyTest
ON (NAME = 'NaturalKeyTest', FILENAME =
'C:\SQLServer\Data\NaturalKeyTest.mdf', SIZE=8GB, FILEGROWTH=1GB)
LOG ON (NAME='NaturalKeyTestLog', FILENAME =
'C:\SQLServer\Logs\NaturalKeyTest.mdf', SIZE=256MB, FILEGROWTH=128MB);
GO
ALTER DATABASE NaturalKeyTest SET RECOVERY SIMPLE;
GO
USE NaturalKeyTest;
GO
CREATE VIEW GetRand
AS
SELECT RAND() AS RandomNumber;
GO
CREATE FUNCTION RandomString
(
@StringLength INT
)
RETURNS NVARCHAR(max)
AS
BEGIN
DECLARE @cnt INT = 0
DECLARE @str NVARCHAR(MAX) = '';
DECLARE @RandomNum FLOAT = 0;
WHILE @cnt < @StringLength
BEGIN
Solution
In general, SQL Server uses B+Trees for indexes. The expense of an index seek is directly related to the length of the key in this storage format. Hence, a surrogate key usually outperforms a natural key on index seeks.
SQL Server clusters a table on the primary key by default. The clustered index key is used to identify rows, so it gets added as included column to every other index. The wider that key, the larger every secondary index.
Even worse, if the secondary indexes are not explicitly defined as
So if the question is, natural versus surrogate clustered index, the surrogate will almost always win.
On the other hand, you are adding that surrogate column to the table making the table in itself bigger. That will cause clustered index scans to get more expensive. So, if you have only very few secondary indexes and your workload requires to look at all (or most of the) rows often, you actually might be better of with a natural key saving those few extra bytes.
Finally, natural keys often make it easier to understand the data model. While using more storage space, natural primary keys lead to natural foreign keys which in turn increase local information density.
So, as so often in the database world, the real answer is "it depends". And - always test in your own environment with realistic data.
SQL Server clusters a table on the primary key by default. The clustered index key is used to identify rows, so it gets added as included column to every other index. The wider that key, the larger every secondary index.
Even worse, if the secondary indexes are not explicitly defined as
UNIQUE the clustered index key automatically becomes part of the key of each of those. That usually applies to most indexes, as usually indexes are declared as unique only when the requirement is to enforce uniqueness.So if the question is, natural versus surrogate clustered index, the surrogate will almost always win.
On the other hand, you are adding that surrogate column to the table making the table in itself bigger. That will cause clustered index scans to get more expensive. So, if you have only very few secondary indexes and your workload requires to look at all (or most of the) rows often, you actually might be better of with a natural key saving those few extra bytes.
Finally, natural keys often make it easier to understand the data model. While using more storage space, natural primary keys lead to natural foreign keys which in turn increase local information density.
So, as so often in the database world, the real answer is "it depends". And - always test in your own environment with realistic data.
Context
StackExchange Database Administrators Q#50708, answer score: 23
Revisions (0)
No revisions yet.