HiveBrain v1.2.0
Get Started
← Back to all entries
debugsqlModerate

Fix corrupted utf8 nvarchar value that was inserted without N prefix

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
fixcorruptedwithoutprefixnvarcharutf8valueinsertedthatwas

Problem

Is it possible to restore the UTF8 data from a corrupted nvarchar, that was used without the N prefix?

For example, in the following snippet, I'd like @n1_fixed to have the correct value based on @n1:

declare
  @n1 nvarchar(10) = 'Ḿấxiḿứś',
  @n2 nvarchar(10) = N'Ḿấxiḿứś';

declare
  @n1_fixed nvarchar(10); -- somehow make it have the correct value, based on @n1

select iif(@n1_fixed = @n2, 1, 0)

Solution

It is not possible to recover the original value.

Character constants without the N prefix to denote national characters are translated according to the database default collation code page. In cases where no clear mapping exists for the source character, the value is changed to a fallback character. This fallback may be similar but different character or the value ? when no alternate character exists.

You can observe this by casting the value as varbinary. Example with a SQL_Latin1_General_CP1_CI_AS default database collation, which uses code page 1252:

DECLARE @n1 nvarchar(10) = 'Ḿấxiḿứś', @n2 nvarchar(10) = N'Ḿấxiḿứś';
SELECT 
      @n1 AS corrupted_string
    , CAST(@n1 AS varbinary(100)) AS corrupted_raw_value
    , @n2 AS correct_string
    , CAST(@n2 AS varbinary(100)) AS correct_raw_value;
GO


corrupted_string
corrupted_raw_value
correct_string
correct_raw_value

??xi??s
0x3F003F00780069003F003F007300
Ḿấxiḿứś
0x3E1EA51E780069003F1EE91E5B01

Results of same script with an Albanian_100_CI_AS database default collation shows the last character is correctly mapped due to the 1250 code page but characters changed to the '?' fallback are lost permanently like the previous example.

corrupted_string
corrupted_raw_value
correct_string
correct_raw_value

??xi??ś
0x3F003F00780069003F003F005B01
Ḿấxiḿứś
0x3E1EA51E780069003F1EE91E5B01

Code Snippets

DECLARE @n1 nvarchar(10) = 'Ḿấxiḿứś', @n2 nvarchar(10) = N'Ḿấxiḿứś';
SELECT 
      @n1 AS corrupted_string
    , CAST(@n1 AS varbinary(100)) AS corrupted_raw_value
    , @n2 AS correct_string
    , CAST(@n2 AS varbinary(100)) AS correct_raw_value;
GO

Context

StackExchange Database Administrators Q#318447, answer score: 12

Revisions (0)

No revisions yet.