patternsqlMinor
Problem with pattern matching on unicode characters
Viewed 0 times
problemwithunicodecharacterspatternmatching
Problem
I'm writing a piece of code that uses pattern matching on unicode characters but I'm running into an odd problem. Some characters work and some don't.
The pattern should be pulling for any character that isn't a-z but for some reason it isn't catching some characters. Does anyone know why some unicode characters would match and others wouldn't?
DECLARE @Pattern nvarchar(50) = N'%[^a-z]%' -- Simple pattern
SELECT PatIndex(@Pattern, nchar(46797)), nchar(46797) -- Works
SELECT PatIndex(@Pattern, nchar(14843)), nchar(14843) -- Doesn't WorkThe pattern should be pulling for any character that isn't a-z but for some reason it isn't catching some characters. Does anyone know why some unicode characters would match and others wouldn't?
Solution
See if doing a binary collate fits what you need. Here is a quick test.
USE Tempdb
GO
IF OBJECT_ID('PattMatch') IS NOT NULL
BEGIN
DROP TABLE PattMatch
END
GO
CREATE TABLE PattMatch (COL1 NVARCHAR(50))
GO
INSERT INTO PattMatch
VALUES (nchar(46797)),(nchar(14843)),('ddddddd*'),('lettersand9999')
GO
DECLARE @Pattern nvarchar(50) = N'%[^a-z]%'
SELECT PatIndex(@Pattern, COL1 COLLATE Latin1_General_BIN2), COL1 FROM PattMatch
GO
DROP TABLE PattMatch
GO
--your test
DECLARE @Pattern nvarchar(50) = N'%[^a-z]%'
SELECT PatIndex(@Pattern, nchar(46797) COLLATE Latin1_General_BIN2)
SELECT PatIndex(@Pattern, nchar(14843) COLLATE Latin1_General_BIN2)Code Snippets
USE Tempdb
GO
IF OBJECT_ID('PattMatch') IS NOT NULL
BEGIN
DROP TABLE PattMatch
END
GO
CREATE TABLE PattMatch (COL1 NVARCHAR(50))
GO
INSERT INTO PattMatch
VALUES (nchar(46797)),(nchar(14843)),('ddddddd*'),('lettersand9999')
GO
DECLARE @Pattern nvarchar(50) = N'%[^a-z]%'
SELECT PatIndex(@Pattern, COL1 COLLATE Latin1_General_BIN2), COL1 FROM PattMatch
GO
DROP TABLE PattMatch
GO
--your test
DECLARE @Pattern nvarchar(50) = N'%[^a-z]%'
SELECT PatIndex(@Pattern, nchar(46797) COLLATE Latin1_General_BIN2)
SELECT PatIndex(@Pattern, nchar(14843) COLLATE Latin1_General_BIN2)Context
StackExchange Database Administrators Q#95416, answer score: 5
Revisions (0)
No revisions yet.