debugsqlModerate
Fix corrupted utf8 nvarchar value that was inserted without N prefix
Viewed 0 times
fixcorruptedwithoutprefixnvarcharutf8valueinsertedthatwas
Problem
Is it possible to restore the UTF8 data from a corrupted nvarchar, that was used without the
For example, in the following snippet, I'd like
N prefix?For example, in the following snippet, I'd like
@n1_fixed to have the correct value based on @n1:declare
@n1 nvarchar(10) = 'Ḿấxiḿứś',
@n2 nvarchar(10) = N'Ḿấxiḿứś';
declare
@n1_fixed nvarchar(10); -- somehow make it have the correct value, based on @n1
select iif(@n1_fixed = @n2, 1, 0)Solution
It is not possible to recover the original value.
Character constants without the
You can observe this by casting the value as varbinary. Example with a
corrupted_string
corrupted_raw_value
correct_string
correct_raw_value
??xi??s
0x3F003F00780069003F003F007300
Ḿấxiḿứś
0x3E1EA51E780069003F1EE91E5B01
Results of same script with an
corrupted_string
corrupted_raw_value
correct_string
correct_raw_value
??xi??ś
0x3F003F00780069003F003F005B01
Ḿấxiḿứś
0x3E1EA51E780069003F1EE91E5B01
Character constants without the
N prefix to denote national characters are translated according to the database default collation code page. In cases where no clear mapping exists for the source character, the value is changed to a fallback character. This fallback may be similar but different character or the value ? when no alternate character exists.You can observe this by casting the value as varbinary. Example with a
SQL_Latin1_General_CP1_CI_AS default database collation, which uses code page 1252:DECLARE @n1 nvarchar(10) = 'Ḿấxiḿứś', @n2 nvarchar(10) = N'Ḿấxiḿứś';
SELECT
@n1 AS corrupted_string
, CAST(@n1 AS varbinary(100)) AS corrupted_raw_value
, @n2 AS correct_string
, CAST(@n2 AS varbinary(100)) AS correct_raw_value;
GOcorrupted_string
corrupted_raw_value
correct_string
correct_raw_value
??xi??s
0x3F003F00780069003F003F007300
Ḿấxiḿứś
0x3E1EA51E780069003F1EE91E5B01
Results of same script with an
Albanian_100_CI_AS database default collation shows the last character is correctly mapped due to the 1250 code page but characters changed to the '?' fallback are lost permanently like the previous example.corrupted_string
corrupted_raw_value
correct_string
correct_raw_value
??xi??ś
0x3F003F00780069003F003F005B01
Ḿấxiḿứś
0x3E1EA51E780069003F1EE91E5B01
Code Snippets
DECLARE @n1 nvarchar(10) = 'Ḿấxiḿứś', @n2 nvarchar(10) = N'Ḿấxiḿứś';
SELECT
@n1 AS corrupted_string
, CAST(@n1 AS varbinary(100)) AS corrupted_raw_value
, @n2 AS correct_string
, CAST(@n2 AS varbinary(100)) AS correct_raw_value;
GOContext
StackExchange Database Administrators Q#318447, answer score: 12
Revisions (0)
No revisions yet.