patternsqlMinor
checksum(newid()) as primary key - what happens if a collision occurs?
Viewed 0 times
primarywhatcollisionchecksumnewidhappensoccurskey
Problem
Inspecting a rather critical database that is used by some software on my system, I found that one of the tables had a primary key on the column
So:
Id, where Id is calculated using checksum(newid()). This makes me feel uneasy; the newids are guaranteed to be unique, but by applying the checksum (to a 32-bit integer, presumably for performance reasons?), you have some chance that you get a collision (with 1M rows in the table, I'd put that chance at around 1:4000, too much for my liking).So:
- Am I missing some crucial bit of information that says the above is actually okay?
- If not - what happens if the application tries to insert a new row into the table and the
checksum(newid())gives a primary key that already exists? Will my table blow up? Will the insert fail and leave it up to the application what to do with it?
Solution
I have to presume that the application is written in such a way that it tries the insert, and if it fails, it just tries again. Otherwise the insert will fail (PK violation) and your users will have complained and filed bugs.
If the purpose of this is to just assign unique numbers in a random order, then it may be the case that the application is working just fine. I think you should only file a bug report if you can demonstrate that the application does something wrong when a collision occurs - not easy given the likely failure rate. An easier way, if you have source code access, would be to review the portion of the code where they perform the insert (or find out if it calls a stored procedure to do so).
That in mind, there is a much easier and fault tolerant way to do this. Build a table up front, with a slew of numbers ordered randomly, then just pluck a number off the top whenever you need one.
Then when you need to add a row, you can just say:
(You could also
If the purpose of this is to just assign unique numbers in a random order, then it may be the case that the application is working just fine. I think you should only file a bug report if you can demonstrate that the application does something wrong when a collision occurs - not easy given the likely failure rate. An easier way, if you have source code access, would be to review the portion of the code where they perform the insert (or find out if it calls a stored procedure to do so).
That in mind, there is a much easier and fault tolerant way to do this. Build a table up front, with a slew of numbers ordered randomly, then just pluck a number off the top whenever you need one.
CREATE TABLE dbo.Destination(ID INT /*, other columns */);
CREATE TABLE dbo.Source
(
RowNum INT PRIMARY KEY,
ID INT, -- UNIQUE
Used BIT NOT NULL DEFAULT 0
);
INSERT dbo.Source(RowNum, ID)
SELECT ROW_NUMBER() OVER (ORDER BY NEWID()), ID
FROM
(
SELECT ID = CHECKSUM(NEWID())
FROM sys.all_objects AS s1
CROSS JOIN sys.all_objects AS s2
-- should produce about 4 million rows,
-- on my crappy VM this took two minutes,
-- add another cross join if you need more
)
AS x GROUP BY ID;Then when you need to add a row, you can just say:
UPDATE TOP (1) dbo.Source SET Used = 1
OUTPUT inserted.ID /*, @params */
INTO dbo.Destination
WHERE Used = 0;(You could also
DELETE TOP (1) however I find keeping them useful for auditing etc.)Code Snippets
CREATE TABLE dbo.Destination(ID INT /*, other columns */);
CREATE TABLE dbo.Source
(
RowNum INT PRIMARY KEY,
ID INT, -- UNIQUE
Used BIT NOT NULL DEFAULT 0
);
INSERT dbo.Source(RowNum, ID)
SELECT ROW_NUMBER() OVER (ORDER BY NEWID()), ID
FROM
(
SELECT ID = CHECKSUM(NEWID())
FROM sys.all_objects AS s1
CROSS JOIN sys.all_objects AS s2
-- should produce about 4 million rows,
-- on my crappy VM this took two minutes,
-- add another cross join if you need more
)
AS x GROUP BY ID;UPDATE TOP (1) dbo.Source SET Used = 1
OUTPUT inserted.ID /*, @params */
INTO dbo.Destination
WHERE Used = 0;Context
StackExchange Database Administrators Q#42738, answer score: 5
Revisions (0)
No revisions yet.