HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

How to handle uniqueness of many large columns in SQL Server?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
howcolumnshandlesqllargemanyserveruniqueness

Problem

I have the following table:

CREATE TABLE dbo.Document
(
    DocumentId int IDENTITY(1,1) NOT NULL CONSTRAINT PK_DocumentPRIMARY KEY CLUSTERED,
    [Timestamp] datetime2(7) NOT NULL CONSTRAINT DF_Document_Timestamp  DEFAULT (getdate()),
    CreatedBy nvarchar(128) NOT NULL CONSTRAINT DF_Document_CreatedBy  DEFAULT (dbo.getCurrentUser()),
    MonthId int NOT NULL,
    TimeModeId int NOT NULL CONSTRAINT FK_Document_TimeMode REFERENCES usr.TimeMode,
    Key1 bit NOT NULL,
    Key2 int NULL,
    Key3 varchar(max) NULL,   -- sometimes above 8000chars
    Key4 varchar(max) NULL,   -- sometimes above 8000chars
    Key5 varchar(max) NULL,   -- sometimes above 8000chars
    Key6 varchar(max) NULL,   -- sometimes above 8000chars
    Key7 varchar(max) NULL,   -- sometimes above 8000chars
    Key8 int NOT NULL,
    CONSTRAINT FK_Document_BrandType FOREIGN KEY(Key8) REFERENCES dbo.BrandType (Key8),
)


Although I have insisted to find a better natural identifier, I had to deal with the following natural unique tuple:

MonthId, TimeModeId, Key1, ... , Key8


This is way too large for an UNIQUE index (max 900 bytes in SQL Server 2014 or less), so I had to come up with something. My idea is to compute a hash for these columns, so I had a PERSISTED COMPUTED columns as above:

FiltersHash  AS (hashbytes('SHA2_256',(
        (((((((((((((((
            (CONVERT(varchar(10),MonthId)+'|') 
            + CONVERT(varchar(4),TimeModeId))
            +'|')+CONVERT(varchar(4),Key1))
            +'|')+isnull(CONVERT(varchar(max),Key2),''))
            +'|')+isnull(CONVERT(varchar(max),Key3),''))
            +'|')+isnull(CONVERT(varchar(max),Key4),''))
            +'|')+isnull(CONVERT(varchar(max),Key5),''))
            +'|')+isnull(CONVERT(varchar(max),Key6),''))
            +'|')+isnull(CONVERT(varchar(max),Key7),''))
            +'|')+isnull(CONVERT(varchar(4),Key8),''))
        ) PERSISTED,
CONSTRAINT UQ_Document_FiltersHash UNIQUE NONCLUSTERED (FiltersHash),


It pr

Solution

The chance of a hash collision is pretty astronomical (as discussed elsewhere on Stack Exchange: https://stackoverflow.com/a/4014407). However, you can reduce it further by adding a second key:

...
FiltersHash  AS HASHBYTES('SHA2_256', /* Various fields */) PERSISTED,
KeyPrefixes  AS CAST(Key1 AS CHAR(1) + '|' + CAST(Key2 AS VARCHAR(10))
    + '|' + LEFT(Key3, 100) + '|' + LEFT(Key4, 100)
    + '|' + LEFT(Key5, 100) + '|' + LEFT(Key6, 100)
    + '|' + LEFT(Key7, 100) + '|' + CAST(Key8 AS VARCHAR(10)) PERSISTED
...
CREATE UNIQUE INDEX UQ_Docs_BizKey ON Documents (FiltersHash, KeyPrefixes)


Now two records must match on three fields and the first part of five more. If your data commonly includes vertical pipes, consider an alternative separator. INSERTs into the table will be a little slower, but at your data volumes it's probably not a concern.

As an aside, you're probably better off clustering on MonthID and leaving DocumentID a non-clustered PK, assuming people occasionally run searches by date range ("all documents from June") but rarely search for a range of document IDs.

Code Snippets

...
FiltersHash  AS HASHBYTES('SHA2_256', /* Various fields */) PERSISTED,
KeyPrefixes  AS CAST(Key1 AS CHAR(1) + '|' + CAST(Key2 AS VARCHAR(10))
    + '|' + LEFT(Key3, 100) + '|' + LEFT(Key4, 100)
    + '|' + LEFT(Key5, 100) + '|' + LEFT(Key6, 100)
    + '|' + LEFT(Key7, 100) + '|' + CAST(Key8 AS VARCHAR(10)) PERSISTED
...
CREATE UNIQUE INDEX UQ_Docs_BizKey ON Documents (FiltersHash, KeyPrefixes)

Context

StackExchange Database Administrators Q#164179, answer score: 2

Revisions (0)

No revisions yet.