snippetsqlMinor
How to handle uniqueness of many large columns in SQL Server?
Viewed 0 times
howcolumnshandlesqllargemanyserveruniqueness
Problem
I have the following table:
Although I have insisted to find a better natural identifier, I had to deal with the following natural unique tuple:
This is way too large for an UNIQUE index (max 900 bytes in SQL Server 2014 or less), so I had to come up with something. My idea is to compute a hash for these columns, so I had a
It pr
CREATE TABLE dbo.Document
(
DocumentId int IDENTITY(1,1) NOT NULL CONSTRAINT PK_DocumentPRIMARY KEY CLUSTERED,
[Timestamp] datetime2(7) NOT NULL CONSTRAINT DF_Document_Timestamp DEFAULT (getdate()),
CreatedBy nvarchar(128) NOT NULL CONSTRAINT DF_Document_CreatedBy DEFAULT (dbo.getCurrentUser()),
MonthId int NOT NULL,
TimeModeId int NOT NULL CONSTRAINT FK_Document_TimeMode REFERENCES usr.TimeMode,
Key1 bit NOT NULL,
Key2 int NULL,
Key3 varchar(max) NULL, -- sometimes above 8000chars
Key4 varchar(max) NULL, -- sometimes above 8000chars
Key5 varchar(max) NULL, -- sometimes above 8000chars
Key6 varchar(max) NULL, -- sometimes above 8000chars
Key7 varchar(max) NULL, -- sometimes above 8000chars
Key8 int NOT NULL,
CONSTRAINT FK_Document_BrandType FOREIGN KEY(Key8) REFERENCES dbo.BrandType (Key8),
)Although I have insisted to find a better natural identifier, I had to deal with the following natural unique tuple:
MonthId, TimeModeId, Key1, ... , Key8This is way too large for an UNIQUE index (max 900 bytes in SQL Server 2014 or less), so I had to come up with something. My idea is to compute a hash for these columns, so I had a
PERSISTED COMPUTED columns as above:FiltersHash AS (hashbytes('SHA2_256',(
(((((((((((((((
(CONVERT(varchar(10),MonthId)+'|')
+ CONVERT(varchar(4),TimeModeId))
+'|')+CONVERT(varchar(4),Key1))
+'|')+isnull(CONVERT(varchar(max),Key2),''))
+'|')+isnull(CONVERT(varchar(max),Key3),''))
+'|')+isnull(CONVERT(varchar(max),Key4),''))
+'|')+isnull(CONVERT(varchar(max),Key5),''))
+'|')+isnull(CONVERT(varchar(max),Key6),''))
+'|')+isnull(CONVERT(varchar(max),Key7),''))
+'|')+isnull(CONVERT(varchar(4),Key8),''))
) PERSISTED,
CONSTRAINT UQ_Document_FiltersHash UNIQUE NONCLUSTERED (FiltersHash),It pr
Solution
The chance of a hash collision is pretty astronomical (as discussed elsewhere on Stack Exchange: https://stackoverflow.com/a/4014407). However, you can reduce it further by adding a second key:
Now two records must match on three fields and the first part of five more. If your data commonly includes vertical pipes, consider an alternative separator. INSERTs into the table will be a little slower, but at your data volumes it's probably not a concern.
As an aside, you're probably better off clustering on
...
FiltersHash AS HASHBYTES('SHA2_256', /* Various fields */) PERSISTED,
KeyPrefixes AS CAST(Key1 AS CHAR(1) + '|' + CAST(Key2 AS VARCHAR(10))
+ '|' + LEFT(Key3, 100) + '|' + LEFT(Key4, 100)
+ '|' + LEFT(Key5, 100) + '|' + LEFT(Key6, 100)
+ '|' + LEFT(Key7, 100) + '|' + CAST(Key8 AS VARCHAR(10)) PERSISTED
...
CREATE UNIQUE INDEX UQ_Docs_BizKey ON Documents (FiltersHash, KeyPrefixes)Now two records must match on three fields and the first part of five more. If your data commonly includes vertical pipes, consider an alternative separator. INSERTs into the table will be a little slower, but at your data volumes it's probably not a concern.
As an aside, you're probably better off clustering on
MonthID and leaving DocumentID a non-clustered PK, assuming people occasionally run searches by date range ("all documents from June") but rarely search for a range of document IDs.Code Snippets
...
FiltersHash AS HASHBYTES('SHA2_256', /* Various fields */) PERSISTED,
KeyPrefixes AS CAST(Key1 AS CHAR(1) + '|' + CAST(Key2 AS VARCHAR(10))
+ '|' + LEFT(Key3, 100) + '|' + LEFT(Key4, 100)
+ '|' + LEFT(Key5, 100) + '|' + LEFT(Key6, 100)
+ '|' + LEFT(Key7, 100) + '|' + CAST(Key8 AS VARCHAR(10)) PERSISTED
...
CREATE UNIQUE INDEX UQ_Docs_BizKey ON Documents (FiltersHash, KeyPrefixes)Context
StackExchange Database Administrators Q#164179, answer score: 2
Revisions (0)
No revisions yet.