HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Problematic slashes and questions marks in unique index values

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
uniquevaluesslashesquestionsandproblematicindexmarks

Problem

What is it about the Phone values inserted below that SQL Server is treating them as identical in a unique index?

CREATE TABLE Phone
(
  Id int identity(1, 1) primary key,
  Phone nvarchar(448) not null
)
go

create unique index IX_Phone on Phone(Phone)
with (data_compression = page);
go

insert into Phone Values ('?281/?263-?8400');
insert into Phone Values ('‎281/‎263-‎8400');

select * from Phone;

drop table Phone;


I receive an error message:


Msg 2601, Level 14, State 1, Line 13
Cannot insert duplicate key row in object 'dbo.Phone' with unique index 'IX_Phone'. The duplicate key value is (?281/?263-?8400).

Solution

Your problem is that you passes unicode strings as non-unicode.

Your '‎281/‎263-‎8400' is a string of 15 characters, not 12, there are 3 non printable 8206 symbols, Left-to-right mark

select len('‎281/‎263-‎8400'); -- 15 !!!


Try this code where I pass the unicode values as unicode and see that there is no problem at all:

CREATE TABLE dbo.Phone
(
  Id int identity(1, 1) primary key,
  Phone nvarchar(448) not null
)
go

create unique index IX_Phone on Phone(Phone)
with (data_compression = page);
go

insert into Phone(phone) Values (N'?281/?263-?8400');
insert into Phone(phone) Values (N'‎281/‎263-‎8400');

select * from Phone;


And this is what your string '‎281/‎263-‎8400' really contains (dbo.nums is my table that contains natural numbers):

declare @t table(col1 nvarchar(100), col2 nvarchar(100));
insert into @t values (N'?281/?263-?8400',  N'‎281/‎263-‎8400'); 

select n, unicode(substring(col1, n, 1)), unicode(substring(col2, n, 1))
from @t cross join dbo.nums
where n <= 15;


Now what happens when you pass your unicode string as non unicode.

Your non-printable symbol 8206 is transformed into ?, that is how non-unicode strings works: every character that cannot be find in the corresponding codepage and represented as ascii code is substituted with question mark.

So for example if you use Latin Collation and try to compare Hebrew and Cyrillic characters (the same number of characters) as varchar they always be equal just because they are transformed to question marks while comparing as nvarchar they are different:

So if now you'll try to insert these 2 values (Hebrew + Cyrillic) into your Phone table passing them as non-unicode (without using N), only one of them will be inserted, and the other will be rejected by unique constraint.
And if you'll try to select from Phone, '?????' will be returned

Code Snippets

select len('‎281/‎263-‎8400'); -- 15 !!!
CREATE TABLE dbo.Phone
(
  Id int identity(1, 1) primary key,
  Phone nvarchar(448) not null
)
go

create unique index IX_Phone on Phone(Phone)
with (data_compression = page);
go

insert into Phone(phone) Values (N'?281/?263-?8400');
insert into Phone(phone) Values (N'‎281/‎263-‎8400');

select * from Phone;
declare @t table(col1 nvarchar(100), col2 nvarchar(100));
insert into @t values (N'?281/?263-?8400',  N'‎281/‎263-‎8400'); 

select n, unicode(substring(col1, n, 1)), unicode(substring(col2, n, 1))
from @t cross join dbo.nums
where n <= 15;

Context

StackExchange Database Administrators Q#185133, answer score: 4

Revisions (0)

No revisions yet.