HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Equality query on NVARCHAR column yields multiple results in SQL Server 2012

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
equalitynvarchar2012columnsqlqueryyieldsmultipleserverresults

Problem

I am in the process of moving a pet project from PostgreSQL (9.2.2) to SQL Server (2012 Standard).

I've noticed an interesting phenomenon when querying unicode words. Given the definition:

CREATE TABLE [word](
    [id] [int] IDENTITY(0,1) NOT NULL,
    [value] [nvarchar](255) NULL    
 );


and the data:

insert into word (value) values (N'ῥύπῳ');
insert into word  (value) values (N'ἀπὸ');
insert into word  (value) values (N'ἀπό');
insert into word (value) values  (N'ἐπὶ');
insert into word (value) values  (N'ἐπί');
insert into word (value) values  (N'ὑπὸ');
insert into word (value) values  (N'ὑπό');
insert into word (value) values  (N'πίῃ');

insert into word  (value) values (N'λόγους');
insert into word  (value) values (N'λόγχῃ');
insert into word (value) values  (N'λόγων');
insert into word  (value) values (N'ἀλόης');


a query for a particular word will return near matches. For example:

select * from word where value = N'ἀπὸ'


returns:

id  value
102137  ῥύπῳ
102141  ἀπὸ
102142  ἀπό
102143  ἐπὶ
102144  ἐπί
102145  ὑπὸ
102146  ὑπό
102147  πίῃ


http://sqlfiddle.com/#!6/1ab66/1

However, the same pattern in PostgreSQL only returns the exact match. How can I get SQL Server to do the same?

(PostgreSQL fiddle link): http://sqlfiddle.com/#!12/c57a6/1

I have the distinct feeling that I'm missing something, but I'm not quite able to figure out what it is.

The database collation is SQL_Latin1_General_CP1_CI_AS (which is also the server's collation) on a local installation.

Solution

The collation determines the comparison semantics.

If I try

CREATE TABLE [word](
    [id] [int] IDENTITY(0,1) NOT NULL,
    [value] [nvarchar](255) COLLATE Latin1_General_100_CI_AS NULL    
 );


It only returns ἀπὸ.

Changing the suffix to AI for accent insensitive returns ἀπό also.

On my install I have tried every collation and 1526 return 1 (presumably AS and BIN collations), 1264 return 2 rows (presumably AI) and 1095 return 8.

From a quick look through this last group looks to include all the SQL collations and 90 collations whereas all the 100 ones are in the first 2 groups so I presume this is some issue that has been fixed in the 2008 batch of collations. (See What's New in SQL Server 2008 Collations)

Script to try this yourself

DECLARE @Results TABLE
(
Count INT,
Collation SYSNAME
)

SET NOCOUNT ON;
DECLARE @N SYSNAME;
DECLARE @C1 AS CURSOR;
SET @C1 = CURSOR FAST_FORWARD FOR 
SELECT name
FROM sys.fn_helpcollations();
OPEN @C1;
FETCH NEXT FROM @C1 INTO @N ;
WHILE @@FETCH_STATUS = 0
BEGIN
  INSERT @Results
  EXEC('SELECT COUNT(*), ''' + @N + ''' from word where value = N''ἀπὸ'' COLLATE ' + @N)
  FETCH NEXT FROM @C1 INTO @N ;
END

SELECT *
FROM @Results
ORDER BY Count DESC

Code Snippets

CREATE TABLE [word](
    [id] [int] IDENTITY(0,1) NOT NULL,
    [value] [nvarchar](255) COLLATE Latin1_General_100_CI_AS NULL    
 );
DECLARE @Results TABLE
(
Count INT,
Collation SYSNAME
)

SET NOCOUNT ON;
DECLARE @N SYSNAME;
DECLARE @C1 AS CURSOR;
SET @C1 = CURSOR FAST_FORWARD FOR 
SELECT name
FROM sys.fn_helpcollations();
OPEN @C1;
FETCH NEXT FROM @C1 INTO @N ;
WHILE @@FETCH_STATUS = 0
BEGIN
  INSERT @Results
  EXEC('SELECT COUNT(*), ''' + @N + ''' from word where value = N''ἀπὸ'' COLLATE ' + @N)
  FETCH NEXT FROM @C1 INTO @N ;
END

SELECT *
FROM @Results
ORDER BY Count DESC

Context

StackExchange Database Administrators Q#31394, answer score: 8

Revisions (0)

No revisions yet.