patternsqlModerate
Is SYSUTCDATETIME a safe primary key?
Viewed 0 times
sysutcdatetimekeyprimarysafe
Problem
I have a serially modified table of processed events.
A utc time for when it was processed and an eventId.
Is it safe to use SYSUTCDATETIME() as my primary key? I can easily use a surrogate key but it felt funny.
A utc time for when it was processed and an eventId.
Is it safe to use SYSUTCDATETIME() as my primary key? I can easily use a surrogate key but it felt funny.
Solution
( Please note that the original wording of the Question was: "Is SYSUTCDATETIME a safe clustered index?" )
What exactly do you mean by "safe"? A Clustered Index doesn't need to be unique, so it is "safe" in the sense that it won't break anything. But it can't be a Primary Key since it can't be guaranteed to be unique. Just try the following and you will see that the value coming back from the function is the same across all rows:
And so the question is then: what exactly would you gain by using
So, I would go with an
UPDATE
No, I personally would not trust a value, even with such high resolution as
UPDATE 2
@ypercubeᵀᴹ brings up a good point in a comment below (a comment that will soon magically disappear now that I mention it here ;-):
you could use the datetime column for the Clustered index and the identity column for the PK. (That's sometimes good, if all or almost all your queries depend on ordering by the datetime. So the occasional not matching of the two orders won't hurt performance.)
Yes, this is sometimes a good option, but it is not without potential consequence due to the Clustered Index key(s) getting copied into Non-Clustered Indexes:
-
If there is only one Non-Clustered Index on this table, then the space taken up by the index will be mostly the same either way. But because the Clustered Index keys have to ultimately be unique (at least internally) as they are the RowID for the Non-Clustered Indexes, for all rows with duplicate
-
If there are several Non-Clustered Indexes, then the impact on space used will be:
(4 bytes minimum + 4 bytes for any duplicated Clustered Index key values)
* number_of_indexes
* rows_in_table (less if filtered indexes are used)
These numbers might seem small to some, but if we are working with hundreds of millions of rows, they do add up. And for those who mistakenly believe that "disk is cheap" please consider that a) Enterprise storage is not cheap, and b) disk operations such as index maintenance, etc are also not cheap (less of an issue with SSD, but still). For a more detailed analysis regarding the down-stream effects of data modeling decisions, please see the following article I published on SQL Server Central: Disk Is Cheap! ORLY? (that site requires free registration to view content).
This is not to say "don't do it", but more so "do it only if the benefit outweighs the cost".
UPDATE 3
For the sake of completeness, I should mention that one issue with using
Likewise with
Please see the following Stack Overflow answer for more details, including a link to a project on CodeProject.com that has code that can be used in SQLCLR to
What exactly do you mean by "safe"? A Clustered Index doesn't need to be unique, so it is "safe" in the sense that it won't break anything. But it can't be a Primary Key since it can't be guaranteed to be unique. Just try the following and you will see that the value coming back from the function is the same across all rows:
SELECT SYSUTCDATETIME(), * FROM sys.objects;
And so the question is then: what exactly would you gain by using
SYSUTCDATETIME() instead of an IDENTITY? I assume you are using a DATETIME2 column, which is 8 bytes, yet you could most likely get away wit a 4 byte INT for the IDENTITY column.So, I would go with an
INT populated via IDENTITY as it is:- smaller (4 bytes instead of 8; and that is 8 bytes in the ideal case of having unique values, but when there are duplicates, a "uniquifier" value is added to any duplicate rows that is an extra 4 bytes)
- guaranteed unique (not only better for a Clustered Index, but eligible to be a PK).
- they would essentially be in the same order anyway (although it is possible that occasionally they will not match in order)
UPDATE
No, I personally would not trust a value, even with such high resolution as
DATETIME2, to be a Primary Key since it is still possible for two operations to happen at the same microsecond. AND, you cannot guarantee that at no point in the future will someone or some process insert more than 1 record in a single statement.UPDATE 2
@ypercubeᵀᴹ brings up a good point in a comment below (a comment that will soon magically disappear now that I mention it here ;-):
you could use the datetime column for the Clustered index and the identity column for the PK. (That's sometimes good, if all or almost all your queries depend on ordering by the datetime. So the occasional not matching of the two orders won't hurt performance.)
Yes, this is sometimes a good option, but it is not without potential consequence due to the Clustered Index key(s) getting copied into Non-Clustered Indexes:
-
If there is only one Non-Clustered Index on this table, then the space taken up by the index will be mostly the same either way. But because the Clustered Index keys have to ultimately be unique (at least internally) as they are the RowID for the Non-Clustered Indexes, for all rows with duplicate
DATETIME2 values there will be that extra 4 byte "uniquifier" added. And it will be added twice: once to the Clustered Index, and once to the Non-Clustered Index for the IDENTITY PK.-
If there are several Non-Clustered Indexes, then the impact on space used will be:
(4 bytes minimum + 4 bytes for any duplicated Clustered Index key values)
* number_of_indexes
* rows_in_table (less if filtered indexes are used)
These numbers might seem small to some, but if we are working with hundreds of millions of rows, they do add up. And for those who mistakenly believe that "disk is cheap" please consider that a) Enterprise storage is not cheap, and b) disk operations such as index maintenance, etc are also not cheap (less of an issue with SSD, but still). For a more detailed analysis regarding the down-stream effects of data modeling decisions, please see the following article I published on SQL Server Central: Disk Is Cheap! ORLY? (that site requires free registration to view content).
This is not to say "don't do it", but more so "do it only if the benefit outweighs the cost".
UPDATE 3
For the sake of completeness, I should mention that one issue with using
DATETIME2 values in the hopes of separating individual INSERT and/or UPDATE statements is that they are not as granular as they appear to be. This is an issue between "resolution" and "precision". The "precision" of DATETIME2 is 7 decimal places. But that doesn't mean that the most granular time value represented is incremented at the next value as it happens. This is just like with DATETIME values that are precise to the millisecond, you will never get a value that has anything but a 0, 3, or 7 in the millisecond position:SELECT CONVERT(DATETIME, '2016-04-14 20:30:40.121'), -- 2016-04-14 20:30:40.120
CONVERT(DATETIME, '2016-04-14 20:30:40.122'), -- 2016-04-14 20:30:40.123
CONVERT(DATETIME, '2016-04-14 20:30:40.128'); -- 2016-04-14 20:30:40.127
Likewise with
DATETIME2 values, there is a moment when a call to SYSUTCDATETIME() is completely accurate. But because the value does not get refreshed at the next microsecond, the reported value will stay constant for a moment until the next time it is refreshed. This is why the simple test that @Paul posted in a comment on the Question gets PK violations, even though the two INSERT statements are arguably running at least 1 microsecond apart.Please see the following Stack Overflow answer for more details, including a link to a project on CodeProject.com that has code that can be used in SQLCLR to
Context
StackExchange Database Administrators Q#135122, answer score: 11
Revisions (0)
No revisions yet.