patternsqlMinor
What is the best to add unique index on nvarchar(max) field on existing table?
Viewed 0 times
uniquefieldthenvarcharwhatmaxexistingindextableadd
Problem
The situation I find myself in is that I am getting bad data into the database for some reason. I want to preserve data integrity by adding to the User table a
Naturally I generated a script to remove any duplicate records, in a non destructive way:
then I found that the field is unable to be used in a
So now what is the best way to get to a result where I have the user table and data with the addition of the
Note: it should be done in a way that allows easy reversal.
UNIQUE INDEX on the UFID field. Naturally I generated a script to remove any duplicate records, in a non destructive way:
update [user]
set UFID = UFID + '_dup_removal'
where ufid in (
select ufid
from [user]
group by ufid
having count(ufid) > 1
)then I found that the field is unable to be used in a
UNIQUE INDEX. So now what is the best way to get to a result where I have the user table and data with the addition of the
UNIQUE INDEX. Note: it should be done in a way that allows easy reversal.
Solution
varchar(max) columns cannot be indexed, as you have found out.
You will need to index something else.
It could be a shortened version of the data (only you can tell whether this is acceptable or not) or a hashed version of the data. The odds of a hash collision are very low.
Another option is cleansing the data and then ensure no duplicates can be inserted using a trigger. This is going to be painfully slow, though.
You will need to index something else.
It could be a shortened version of the data (only you can tell whether this is acceptable or not) or a hashed version of the data. The odds of a hash collision are very low.
USE tempdb;
IF OBJECT_ID('user') IS NOT NULL
DROP TABLE [user];
CREATE TABLE [user] (
UFID varchar(max)
)
INSERT INTO [user] VALUES(1),(2),(3),(4),(4),(4),(3),(5);
WITH ranked_users AS (
SELECT *, RN = ROW_NUMBER() OVER(PARTITION BY UFID ORDER BY (SELECT NULL))
FROM [user]
)
update ranked_users
set UFID = UFID + '_dup_removal_' + CAST(RN AS varchar(10))
where RN > 1;
--CREATE UNIQUE INDEX IX_UFID ON [user](UFID) -- FAILS
-- Solution 1: add a computed column with the size trimmed down
ALTER TABLE [user] ADD shortened_ufid AS CAST(UFID AS varchar(900))
CREATE UNIQUE INDEX IX_shortened_UFID ON [user](shortened_ufid)
-- Solution 2: add a computed column with the hashed version of the data
ALTER TABLE [user] ADD hashed_ufid AS CAST(HASHBYTES('SHA1', UFID) AS bigint)
CREATE UNIQUE INDEX IX_hashed_UFID ON [user](hashed_UFID)
SELECT *
FROM [user]
GOAnother option is cleansing the data and then ensure no duplicates can be inserted using a trigger. This is going to be painfully slow, though.
-- solution 3: use a trigger
CREATE TRIGGER TR_no_dupes ON [user]
FOR INSERT, UPDATE
AS
BEGIN
IF EXISTS (
SELECT 1
FROM [user]
WHERE UFID IN (
SELECT UFID
FROM inserted
)
GROUP BY UFID
HAVING COUNT(*) > 1
)
ROLLBACK;
ENDCode Snippets
USE tempdb;
IF OBJECT_ID('user') IS NOT NULL
DROP TABLE [user];
CREATE TABLE [user] (
UFID varchar(max)
)
INSERT INTO [user] VALUES(1),(2),(3),(4),(4),(4),(3),(5);
WITH ranked_users AS (
SELECT *, RN = ROW_NUMBER() OVER(PARTITION BY UFID ORDER BY (SELECT NULL))
FROM [user]
)
update ranked_users
set UFID = UFID + '_dup_removal_' + CAST(RN AS varchar(10))
where RN > 1;
--CREATE UNIQUE INDEX IX_UFID ON [user](UFID) -- FAILS
-- Solution 1: add a computed column with the size trimmed down
ALTER TABLE [user] ADD shortened_ufid AS CAST(UFID AS varchar(900))
CREATE UNIQUE INDEX IX_shortened_UFID ON [user](shortened_ufid)
-- Solution 2: add a computed column with the hashed version of the data
ALTER TABLE [user] ADD hashed_ufid AS CAST(HASHBYTES('SHA1', UFID) AS bigint)
CREATE UNIQUE INDEX IX_hashed_UFID ON [user](hashed_UFID)
SELECT *
FROM [user]
GO-- solution 3: use a trigger
CREATE TRIGGER TR_no_dupes ON [user]
FOR INSERT, UPDATE
AS
BEGIN
IF EXISTS (
SELECT 1
FROM [user]
WHERE UFID IN (
SELECT UFID
FROM inserted
)
GROUP BY UFID
HAVING COUNT(*) > 1
)
ROLLBACK;
ENDContext
StackExchange Database Administrators Q#82395, answer score: 7
Revisions (0)
No revisions yet.