debugsqlMinor
Cannot remove char 0x0000 from NVARCHAR column
Viewed 0 times
cannotnvarcharcolumncharremovefrom0x0000
Problem
So, I know all about the Replace function and a bug with char(0).
I have a column (
I'm using SQL Server 2008 R2.
The collation for the column is:
I have tried all the stuff online that I can possibly find, and NOTHING will get the stinking char(0) characters out of the column.
Here's my latest attempt, with BAFFLING (bug in sql server?) results.
I have a function that loops through each char and replaces 0x0000 with a specific char.
DECLARE @ShortDescription NVARCHAR(128), @SupplierId INT, @Language CHAR(2)
SELECT TOP 1 @ShortDescription = ShortDescription,
@SupplierId = SupplierID,
@Language = Language
FROM Supplier_Multilingual
WHERE ShortDescription LIKE '%' + CHAR(0x00) + '%'
SET @ShortDescription = REPLACE(dbo.ReplaceCharZero(@ShortDescription, ' '), '-', ' ')
UPDATE dbo.Supplier_MultiLingual
SET ShortDescription = NULL
WHERE SupplierID = @SupplierId
AND Language = @Language
UPDATE dbo.Supplier_MultiLingual
SET ShortDescription = dbo.ReplaceCharZero(@ShortDescription, '')
WHERE SupplierID = @SupplierId
AND Language = @Language
SELECT *
FROM Supplier_Multilingual
WHERE SupplierId = @SupplierId
AND Language = @Language
AND ShortDescription LIKE '%' + CHAR(0x00) + '%'
ROLLBACK TRAN
I have a column (
NVARCHAR(128)) that has some NCHAR(0x0000) characters from a bad import.I'm using SQL Server 2008 R2.
The collation for the column is:
SQL_Latin1_General_CP1_CI_AS.I have tried all the stuff online that I can possibly find, and NOTHING will get the stinking char(0) characters out of the column.
Here's my latest attempt, with BAFFLING (bug in sql server?) results.
I have a function that loops through each char and replaces 0x0000 with a specific char.
ALTER FUNCTION dbo.ReplaceCharZero
(
@testString NVARCHAR(MAX),
@charToReplaceWith NCHAR(1) = ' '
)
RETURNS NVARCHAR(MAX)
AS
BEGIN
DECLARE
@i INT = 1 ,
@fixedString NVARCHAR(MAX) = ''
WHILE @i
And here's what I do to test:
BEGIN TRANDECLARE @ShortDescription NVARCHAR(128), @SupplierId INT, @Language CHAR(2)
SELECT TOP 1 @ShortDescription = ShortDescription,
@SupplierId = SupplierID,
@Language = Language
FROM Supplier_Multilingual
WHERE ShortDescription LIKE '%' + CHAR(0x00) + '%'
SET @ShortDescription = REPLACE(dbo.ReplaceCharZero(@ShortDescription, ' '), '-', ' ')
UPDATE dbo.Supplier_MultiLingual
SET ShortDescription = NULL
WHERE SupplierID = @SupplierId
AND Language = @Language
UPDATE dbo.Supplier_MultiLingual
SET ShortDescription = dbo.ReplaceCharZero(@ShortDescription, '')
WHERE SupplierID = @SupplierId
AND Language = @Language
SELECT *
FROM Supplier_Multilingual
WHERE SupplierId = @SupplierId
AND Language = @Language
AND ShortDescription LIKE '%' + CHAR(0x00) + '%'
ROLLBACK TRAN
In my test, I grab the column as a variable, I run my function on it to strip out the 0x0000, then I update the original column with a NULL, then I update it to my fixed variable, and then I run a query to see if 0x0000` chars still exist, which they do.Solution
A bug in the
Collations define how certain characters will compare against other characters. Sometimes there are rules about certain combinations of characters equating to one or more other characters. And there are rules about characters such as
Returns:
So let's take a look at this behavior using queries based on the ones found in @Max's answer. I added the
this is
0x7400680069007300200069007300000061002000740065007300740000006F00660020006E0075006C006C00
this is
The first result shows that the string ends just after the "is" due to
But should we be expecting that
Add the following query to the set above and re-run the batch.
You should get the following additional result:
this is~a test~of null
So the
Ok, so that only addressed the issue of
Now we will use an adaptation of the main query from @Max's answer. I again added an extra
Found a NULL char (Code Point = 0) at position: 8
Found a NULL char (Code Point = 32) at position: 10
Found a NULL char (Code Point = 0) at position: 15
Found a NULL char (Code Point = 32) at position: 18
Found a NULL char (Code Point = 0) at position: 15
Summary:
REPLACE function? Or in SQL Server in general? I'm not so sure about that. The only "issue" here is merely not fully understanding how string comparisons are handled.Collations define how certain characters will compare against other characters. Sometimes there are rules about certain combinations of characters equating to one or more other characters. And there are rules about characters such as
0x00 (null) and 0x20 (space) equating to each other or other characters. And, to make life more interesting, there are a few nuances specific to VARCHAR data using a SQL Server Collation (i.e. one starting with SQL_), as the following example shows:SELECT REPLACE('VARCHAR with SQL_Latin1_General_CP1_CI_AS'+CHAR(0)+'Matches', CHAR(0),
': ' COLLATE SQL_Latin1_General_CP1_CI_AS);
SELECT REPLACE(N'NVARCHAR with SQL_Latin1_General_CP1_CI_AS'+NCHAR(0)+N'Matches', NCHAR(0),
N': ' COLLATE SQL_Latin1_General_CP1_CI_AS);
SELECT REPLACE('VARCHAR with Latin1_General_100_CI_AS'+CHAR(0)+'Matches', CHAR(0),
': ' COLLATE Latin1_General_100_CI_AS);
SELECT REPLACE(N'NVARCHAR with Latin1_General_100_CI_AS'+NCHAR(0)+N'Matches', NCHAR(0),
N': ' COLLATE Latin1_General_100_CI_AS);
Returns:
VARCHAR with SQL_Latin1_General_CP1_CI_AS: Matches
NVARCHAR with SQL_Latin1_General_CP1_CI_AS
VARCHAR with Latin1_General_100_CI_AS
NVARCHAR with Latin1_General_100_CI_AS
So let's take a look at this behavior using queries based on the ones found in @Max's answer. I added the
N prefix to the string literals and to the CHAR(0), and I also added an extra NCHAR(0) just to make the next part easier to see. And I added queries to show the actual Code Points being used (to prove that the 0x0000 values are really in there, and a call to REPLACE() to see if that really has a bug in it).DECLARE @Data NVARCHAR(255);
SELECT @Data = N'this is' + NCHAR(0) + N'a test' + NCHAR(0) + N'of null';
SELECT @Data;
SELECT CONVERT(VARBINARY(50), @Data);
SELECT REPLACE(@Data, NCHAR(0), N'~');
this is
0x7400680069007300200069007300000061002000740065007300740000006F00660020006E0075006C006C00
this is
The first result shows that the string ends just after the "is" due to
(null) termination. The second result shows the underlying codes, and I emphasized the two instances of the 0x0000 character. The third result shows that the REPLACE function doesn't seem to match the 0x0000 character to the NCHAR(0) that is passed in.But should we be expecting that
NCHAR(0) would be matching here? We can effectively disable all of the equivalence rules that are usually applied to string comparisons by forcing a binary collation. We will use a _BIN2 collation since the _BIN collations are deprecated and shouldn't be used unless you have a specific need for them.Add the following query to the set above and re-run the batch.
SELECT REPLACE(@Data, NCHAR(0) COLLATE Latin1_General_100_BIN2, N'~');
You should get the following additional result:
this is~a test~of null
So the
REPLACE function does in fact work, and this was tested on both SQL Server 2008 R2, SP3 as well as SQL Server 2012 SP2.Ok, so that only addressed the issue of
REPLACE not working with NCHAR(0), but did not address NCHAR(0) equating to a space (i.e. NCHAR(32) or NCHAR(0x20)).Now we will use an adaptation of the main query from @Max's answer. I again added an extra
NCHAR(0) to the test string (really just replaced the space at position 8 with it), and I added the Code Point of the matching character to the RAISERROR message.SET NOCOUNT ON;
GO
DECLARE @Data NVARCHAR(255);
SELECT @Data = N'this is' + NCHAR(0) + N'a test' + NCHAR(0) + N'of null';
DECLARE @i INT,
@CodePoint INT;
SET @i = 1;
WHILE @i
This query (with the COLLATE clause still commented out) will return:
Found a NULL char (Code Point = 32) at position: 5Found a NULL char (Code Point = 0) at position: 8
Found a NULL char (Code Point = 32) at position: 10
Found a NULL char (Code Point = 0) at position: 15
Found a NULL char (Code Point = 32) at position: 18
These are the same positions reported in @Max's test, but now it shows what Code Point it is matching in each case. And yes, it is equating to both 32 and 0.
Now, uncomment the COLLATE clause and re-run it. It will return:
Found a NULL char (Code Point = 0) at position: 8Found a NULL char (Code Point = 0) at position: 15
Another way to accomplish this, and without using the COLLATE clause, is to change the IF statement to be:
IF ( UNICODE(SUBSTRING(@Data, @i, 1)) = 0 )
Of course, neither of these two fixes -- the WHILE loop using either the COLLATE clause or the UNICODE() function -- are needed to address the original problem of removing the 0x0000 characters from the input data since the simple REPLACE (using the COLLATE` clause) handles that.Summary:
- If you are wanting to replace / re
Context
StackExchange Database Administrators Q#82094, answer score: 7
Revisions (0)
No revisions yet.