HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

What is the best to add unique index on nvarchar(max) field on existing table?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
uniquefieldthenvarcharwhatmaxexistingindextableadd

Problem

The situation I find myself in is that I am getting bad data into the database for some reason. I want to preserve data integrity by adding to the User table a UNIQUE INDEX on the UFID field.

Naturally I generated a script to remove any duplicate records, in a non destructive way:

update [user]
set UFID = UFID + '_dup_removal'
where ufid in (
  select ufid 
  from [user]
  group by ufid
  having count(ufid) > 1
)


then I found that the field is unable to be used in a UNIQUE INDEX.

So now what is the best way to get to a result where I have the user table and data with the addition of the UNIQUE INDEX.

Note: it should be done in a way that allows easy reversal.

Solution

varchar(max) columns cannot be indexed, as you have found out.
You will need to index something else.

It could be a shortened version of the data (only you can tell whether this is acceptable or not) or a hashed version of the data. The odds of a hash collision are very low.

USE tempdb;

IF OBJECT_ID('user') IS NOT NULL
    DROP TABLE [user];

CREATE TABLE [user] (
    UFID varchar(max)
)

INSERT INTO [user] VALUES(1),(2),(3),(4),(4),(4),(3),(5);

WITH ranked_users AS (
    SELECT *, RN = ROW_NUMBER() OVER(PARTITION BY UFID ORDER BY (SELECT NULL))
    FROM [user]
)
update ranked_users
set UFID = UFID + '_dup_removal_' + CAST(RN AS varchar(10))
where RN > 1;

--CREATE UNIQUE INDEX IX_UFID ON [user](UFID) -- FAILS

-- Solution 1: add a computed column with the size trimmed down 
ALTER TABLE [user] ADD shortened_ufid AS CAST(UFID AS varchar(900))

CREATE UNIQUE INDEX IX_shortened_UFID ON [user](shortened_ufid) 

-- Solution 2: add a computed column with the hashed version of the data
ALTER TABLE [user] ADD hashed_ufid AS CAST(HASHBYTES('SHA1', UFID) AS bigint)

CREATE UNIQUE INDEX IX_hashed_UFID ON [user](hashed_UFID) 

SELECT *
FROM [user]
GO


Another option is cleansing the data and then ensure no duplicates can be inserted using a trigger. This is going to be painfully slow, though.

-- solution 3: use a trigger
CREATE TRIGGER TR_no_dupes ON [user]
FOR INSERT, UPDATE
AS
BEGIN

    IF EXISTS (
        SELECT 1
        FROM [user] 
        WHERE UFID IN (
            SELECT UFID 
            FROM inserted
        )
        GROUP BY UFID
        HAVING COUNT(*) > 1
    )
    ROLLBACK;

END

Code Snippets

USE tempdb;

IF OBJECT_ID('user') IS NOT NULL
    DROP TABLE [user];

CREATE TABLE [user] (
    UFID varchar(max)
)

INSERT INTO [user] VALUES(1),(2),(3),(4),(4),(4),(3),(5);


WITH ranked_users AS (
    SELECT *, RN = ROW_NUMBER() OVER(PARTITION BY UFID ORDER BY (SELECT NULL))
    FROM [user]
)
update ranked_users
set UFID = UFID + '_dup_removal_' + CAST(RN AS varchar(10))
where RN > 1;


--CREATE UNIQUE INDEX IX_UFID ON [user](UFID) -- FAILS

-- Solution 1: add a computed column with the size trimmed down 
ALTER TABLE [user] ADD shortened_ufid AS CAST(UFID AS varchar(900))

CREATE UNIQUE INDEX IX_shortened_UFID ON [user](shortened_ufid) 

-- Solution 2: add a computed column with the hashed version of the data
ALTER TABLE [user] ADD hashed_ufid AS CAST(HASHBYTES('SHA1', UFID) AS bigint)

CREATE UNIQUE INDEX IX_hashed_UFID ON [user](hashed_UFID) 

SELECT *
FROM [user]
GO
-- solution 3: use a trigger
CREATE TRIGGER TR_no_dupes ON [user]
FOR INSERT, UPDATE
AS
BEGIN


    IF EXISTS (
        SELECT 1
        FROM [user] 
        WHERE UFID IN (
            SELECT UFID 
            FROM inserted
        )
        GROUP BY UFID
        HAVING COUNT(*) > 1
    )
    ROLLBACK;

END

Context

StackExchange Database Administrators Q#82395, answer score: 7

Revisions (0)

No revisions yet.