patternsqlMinor

Query to find rows containing ASCII characters in a given range

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

containingrowsrangequeryasciifindcharactersgiven

Problem

I am using some scripts from another topic, but the accepted answer isn't working for all my data scenarios. I would have asked my question on the original How to check for Non-Ascii Characters post, but I don't have enough reputation to comment or up-vote as of yet.

Questions:

Why is the accepted answer not working for char(150)?

Is there a way to fix the accepted answer?

My Testing

I created SQL Fiddle with sample data, the stored procedure from one of the answers, and queries to demonstrate the issue.

Query 1: sample_table

-- Note: The "bad dash" row has char(150)

SELECT * FROM sample_table;

+-------------------+
|    DataColumn     |
+-------------------+
| test - good dash  |
| test – bad dash   |
+-------------------+

Query 2: The other answer by John shows the "bad dash" row containing char(150):

SELECT dbo.Find_Invalid_Chars(DataColumn) [Invalid Characters]
FROM sample_table
WHERE dbo.Find_Invalid_Chars(DataColumn) IS NOT NULL;

+----------------------+
|  Invalid Characters  |
+----------------------+
| test [150] bad dash  |
+----------------------+

Query 3: The accepted answer by Martin Smith returns no results:

SELECT DataColumn AS [Bad Data]
FROM sample_table
WHERE DataColumn LIKE '%[' + CHAR(127)+ '-' +CHAR(255)+']%' COLLATE Latin1_General_100_BIN2;

+------------+
| [Bad Data] |
+------------+

-- No rows returned.

Conclusion

Unfortunately, I often need to find characters within (or outside of) a range in databases I can't create stored procedures in. I'd really like to find a fix for the accepted answer or a simple script that wouldn't require creation of any objects (including temp tables).

Any suggestions? Thanks in advance.

EDIT 1: The solution cannot modify or add any objects or settings in the database. I'm looking for a self-contained query that will select rows with one or more characters in a range between two CHAR() numbers, regardless of the ASCII or Extended ASCII number supp

Solution

Why is the accepted answer not working for char(150)?

Actually, it does. The problem is your test is bad / invalid. You test column, DataColumn, is using NVARCHAR instead of VARCHAR. The character itself works in both datatypes, but the behavior is different due to how it is being used in each case:

In the Find_Invalid_Chars() function (i.e. the "other" answer), the string is being converted back into VARCHAR since that is the datatype of the input parameter for that function. In this case it works as expected (although I believe it can be done much more efficiently than that loop, but that's for another time ;-)

In the LIKE query (i.e. the "accepted" answer), the expanded and concatenated result of '%[' + CHAR(127)+ '-' +CHAR(255)+']%' is actually converted into NVARCHAR since that is the datatype of the column it is being compared to (and NVARCHAR has a higher datatype precedence), hence that LIKE function is not behaving as expected: either the CHAR(255) character maps to a different code point, and/or the CHAR(150) character in the column itself maps to a different code point (the CHAR(127) character does not change as it's in the standard ASCII range). In either case, the conversion to NVARCHAR is causing the numerical value of the "En Dash" character ("–") to no longer be within that range. Meaning, the LIKE function is looking for values, y, between 127 and x (where x >= 128), and y for the "En Dash" character is now > x. Whereas in VARCHAR, x = 255 and y = 150.

The quick fix to see that it does work is simply to change the NVARCHAR datatype of the DataColumn column to be VARCHAR (yes, just remove the initial "N"), then re-build the schema, then execute, and the LIKE query will behave as expected.

The following might help explain why making the test column NVARCHAR caused the LIKE query to not match the row:

SELECT UNICODE(CHAR(127)) AS [CHAR(127)],
       UNICODE(CHAR(150)) AS [CHAR(150)],
       UNICODE(CHAR(255)) AS [CHAR(255)];

/*
CHAR(127)     CHAR(150)     CHAR(255)
127           8211          255
*/

As you can see in the results below the query, the "bad dash", which was CHAR(150) became NCHAR(8211) when stored in the NVARCHAR column. And, since that predicate is using a binary collation (generally the correct thing to do in this scenario), it was looking at the code points / values, not the characters. Hence, the LIKE clause was looking for characters with values between 127 and 255, and 8211 is typically not in that range ;-).

P.S. Please keep in mind that the function CHAR(150) can return different characters, or even NULL, based on the default collation of the database in which you execute that function. This is because VARCHAR data is based on code pages, and those are determined by the collation, and the collation being used when executing the CHAR() function is the default collation of the active / current database. This affects values 128 - 255. Values 0 - 127 will always return the same characters, regardless of collation, as those are the standard ASCII character set and are the same across all of the code pages supported in SQL Server (though not in all code pages in general).

P.P.S. ALSO, I just noticed a slight difference in logic between the function and the query (i.e. the two answers from the linked question): CHAR(127) is considered good / valid in the Find_Invalid_Chars() function, yet it's considered bad / invalid in the LIKE query. If it were me, I would consider CHAR(127) valid as it's part of the standard ASCII character set. But, you need to decide what you consider it. Just be aware of that difference in case you do need to adjust the LIKE syntax a little.

Given:

-

The purpose of the query is to find characters in the source table/column that are not handled correctly by some software applications.

and:

-

The data can be in either VARCHAR or NVARCHAR.

I would say that:

-
You don't want to convert NVARCHAR source data into VARCHAR as there might be "best fit" mappings that translate invalid source characters into valid characters, but one or more of your software applications might not use "best fit" mappings.

SELECT NCHAR(178) AS [Unicode], -- Superscript 2 (U+00B2)
       CONVERT(VARCHAR(5), NCHAR(178)
                   COLLATE SQL_Latin1_General_CP1_CI_AS) AS [CodePage-1252],
       CONVERT(VARCHAR(5), NCHAR(178)
                   COLLATE Turkmen_100_CI_AS) AS [CodePage-1250]

/*
Unicode    CodePage-1252    CodePage-1250
²          ²                2
*/

-
It will likely be more reliable to look for characters not in a specific "valid" range as opposed to those in a specific invalid range, especially when dealing with NVARCHAR which holds a lot more than 256 characters.

-
You could get away with a single query if the "valid" range is always between values 0 and 127 (since those values are the same in both cases). But if you need to specify va

Context

StackExchange Database Administrators Q#251740, answer score: 5

Revisions (0)

No revisions yet.