HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMajor

Is 'Avoid creating a clustered index based on an incrementing key' a myth from SQL Server 2000 days?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
clustered2000creatingsqlincrementingavoidmythbasedserverindex

Problem

Our databases consist of lots of tables, most of them using an integer surrogate key as a primary key. About half of these primary keys are on identity columns.

The database development started in the days of SQL Server 6.0.

One of the rules followed from the beginning was, as you find in these Index Optimization Tips:

Avoid creating a clustered index based on an incrementing key.

For example, if a table has surrogate integer primary key declared as IDENTITY and the clustered index was created on this column, then every time data is inserted into this table, the rows will be added to the end of the table. When many rows will be added a "hot spot" can occur. A "hot spot" occurs when many queries try to read or write data in the same area at the same time. A "hot spot" results in I/O bottleneck.

Note. By default, SQL Server creates clustered index for the primary key constraint. So, in this case, you should explicitly specify NONCLUSTERED keyword to indicate that a nonclustered index is created for the primary key constraint.

Now using SQL Server 2005 and SQL Server 2008, I have the strong impression that the circumstances changed. Meanwhile, these primary key columns are perfect first candidates for the clustered index of the table.

Solution

The myth goes back to before SQL Server 6.5, which added row level locking. And hinted at here by Kalen Delaney.

It was to do with "hot spots" of data page usage and the fact that a whole 2k page (SQL Server 7 and higher use 8k pages) was locked, rather then an inserted row

Found authoritative article by Kimberly L. Tripp

"The Clustered Index Debate Continues..."

Hotspots were something that we greatly tried to avoid PRIOR to SQL Server 7.0 because of page level locking (and this is where the term hot spot became a negative term). In fact, it doesn't have to be a negative term. However, since the storage engine was rearchitected/redesigned (in SQL Server 7.0) and now includes true row level locking, this motivation (to avoid hotspots) is no longer there.

The link in lucky7_2000's answer seems to say that hotspots can exist and they cause issues.
However, the article uses a non-unique clustered index on TranTime. This requires a uniquifier to be added. Which means the index in not strictly monotonically increasing (and too wide). The link in that answer does not contradict this answer or my links.

On a personal level, I have woked on databases where I inserted tens of thousands of rows per second into a table that has a bigint IDENTITY column as the clustered PK.

Context

StackExchange Database Administrators Q#1584, answer score: 33

Revisions (0)

No revisions yet.