snippetsqlMajor
How to check for Non-Ascii Characters
Viewed 0 times
nonasciiforhowcharacterscheck
Problem
What is the best way to check if a VARCHAR field has Non-Ascii Characters?
I tried using
Checking the lower range worked correctly.
My data had three records with 0x1E and all three where returned.
But when I check just the upper range:
It returns close to all the records in the table (table count 170737 and returned count 170735) and since my data did not have any values in this range I would think it should have returned no records.
CHAR(1) through CHAR(31) and CHAR(127) through CHAR(255).I tried using
PATINDEX and have run into the following issue.Checking the lower range worked correctly.
SELECT *
FROM mbrnotes
WHERE PATINDEX('%[' + CHAR(1)+ '-' +CHAR(31)+']%',LINE_TEXT) > 0My data had three records with 0x1E and all three where returned.
But when I check just the upper range:
SELECT *
FROM mbrnotes
WHERE PATINDEX('%[' + CHAR(127)+ '-' +CHAR(255)+']%',LINE_TEXT) > 0It returns close to all the records in the table (table count 170737 and returned count 170735) and since my data did not have any values in this range I would think it should have returned no records.
Solution
Ranges in the pattern syntax use the sorting rules of your collation.
Use a binary collate clause so the range is ordered by character code.
(I also changed it to
If you actually want to see the offending characters and you are on a version with the
You can then use that result in a second call to
Use a binary collate clause so the range is ordered by character code.
(I also changed it to
LIKE as I find that more obvious than PATINDEX > 0)SELECT *
FROM mbrnotes
WHERE LINE_TEXT COLLATE Latin1_General_100_BIN2
LIKE '%[' + CHAR(0)+ '-' +CHAR(31) + CHAR(127)+ '-' +CHAR(255)+']%'If you actually want to see the offending characters and you are on a version with the
TRANSLATE function you can use something like the belowDECLARE @WhiteListedCharacters NVARCHAR(1000)
= ' !"#$%&''()*+,-./0123456789:;?@ABCDEFGHIJKLMNOPQRSTUVWXYZ'
+ '[\]^_`abcdefghijklmnopqrstuvwxyz{|}~'
SELECT text,
REPLACE(
TRANSLATE(
text,
@WhiteListedCharacters COLLATE Latin1_General_100_BIN2,
REPLICATE(
LEFT(@WhiteListedCharacters,1),
LEN(@WhiteListedCharacters))),
LEFT(@WhiteListedCharacters,1),
'') AS BadChars
FROM sys.messages
WHERE language_id = 1038You can then use that result in a second call to
TRANSLATE to preserve only the "good" characters.DECLARE @WhiteListedCharacters NVARCHAR(1000)
= ' !"#$%&''()*+,-./0123456789:;?@ABCDEFGHIJKLMNOPQRSTUVWXYZ'
+ '[\]^_`abcdefghijklmnopqrstuvwxyz{|}~'
SELECT text,
BadChars,
Cleaned = REPLACE(
TRANSLATE(
text COLLATE Latin1_General_100_BIN2,
BadChars,
REPLICATE(N'ψ', LEN(BadChars))
),
N'ψ', N'')
FROM sys.messages
CROSS APPLY
(
SELECT REPLACE(
TRANSLATE(
text,
@WhiteListedCharacters COLLATE Latin1_General_100_BIN2,
REPLICATE(
LEFT(@WhiteListedCharacters,1),
LEN(@WhiteListedCharacters + '-') - 1)),
LEFT(@WhiteListedCharacters,1),
'') AS BadChars
) ca
WHERE language_id = 1038Code Snippets
SELECT *
FROM mbrnotes
WHERE LINE_TEXT COLLATE Latin1_General_100_BIN2
LIKE '%[' + CHAR(0)+ '-' +CHAR(31) + CHAR(127)+ '-' +CHAR(255)+']%'DECLARE @WhiteListedCharacters NVARCHAR(1000)
= ' !"#$%&''()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ'
+ '[\]^_`abcdefghijklmnopqrstuvwxyz{|}~'
SELECT text,
REPLACE(
TRANSLATE(
text,
@WhiteListedCharacters COLLATE Latin1_General_100_BIN2,
REPLICATE(
LEFT(@WhiteListedCharacters,1),
LEN(@WhiteListedCharacters))),
LEFT(@WhiteListedCharacters,1),
'') AS BadChars
FROM sys.messages
WHERE language_id = 1038DECLARE @WhiteListedCharacters NVARCHAR(1000)
= ' !"#$%&''()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ'
+ '[\]^_`abcdefghijklmnopqrstuvwxyz{|}~'
SELECT text,
BadChars,
Cleaned = REPLACE(
TRANSLATE(
text COLLATE Latin1_General_100_BIN2,
BadChars,
REPLICATE(N'ψ', LEN(BadChars))
),
N'ψ', N'')
FROM sys.messages
CROSS APPLY
(
SELECT REPLACE(
TRANSLATE(
text,
@WhiteListedCharacters COLLATE Latin1_General_100_BIN2,
REPLICATE(
LEFT(@WhiteListedCharacters,1),
LEN(@WhiteListedCharacters + '-') - 1)),
LEFT(@WhiteListedCharacters,1),
'') AS BadChars
) ca
WHERE language_id = 1038Context
StackExchange Database Administrators Q#138350, answer score: 27
Revisions (0)
No revisions yet.