HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Nvarchar to varchar to nvarchar characters conversion

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
charactersconversionnvarcharvarchar

Problem

I had a table filled with some greek text rows as nvarchar(2000).

Lately, I altered the type of the column to varchar(4000) but I realized that some greek characters were shown as question marks.

So, I tried to change back to nvarchar(4000) in order to fix it as I suppose that the characters' unicode still remains the same.

I am just wondering, is there any way to fix this instead of restoring the backup I created before altering the table?

Solution

No, there is no way to "fix" the data because the data is no longer there. When you converted to VARCHAR, the underlying values for each character were changed into the ASCII value for ?. This is not a display issue, those characters are now physically a regular question mark. You will need to do a restore from a backup, unfortunately.

The following example code shows that once a Unicode character is converted to VARCHAR (assuming the Code Page denoted by the Collation does not support that character), that it becomes a regular 'ol question mark, and forever shall it remain as such:

DECLARE @Character NCHAR(1) = NCHAR(0x3525);
SELECT @Character AS [TheCharacter],
UNICODE(@Character) AS [DecimalCodePoint],
ASCII(@Character) AS [DecimalValueOfVarchar],
UNICODE(CONVERT(VARCHAR(5), @Character)) AS [ConvertToVarchar],
UNICODE(CONVERT(NVARCHAR(5), CONVERT(VARCHAR(5), @Character)))
AS [ConvertToVarcharAndBack],
ASCII('?') AS [VarcharQuestionMark],
UNICODE(N'?') AS [UnicodeQuestionMark];

-- 㔥 13605 63 63 63 63 63


The following example shows an instance of a Unicode character that is highly doubtful (at least at this time) to be supported in most fonts, hence it appears as a square box, but the UNICODE built-n function shows that the underlying code is still the correct Unicode Code Point:

SELECT NCHAR(0xABBF), N'ꮿ', UNICODE(N'ꮿ');

-- ꮿ ꮿ 43967


The actual character can be seen here: Cherokee Small Letter YA U+ABBF. This is a display issue, and many characters that are not represented in various fonts will display in the same manner without altering the actual value of the character, but they are still distinct characters.

Context

StackExchange Database Administrators Q#149235, answer score: 8

Revisions (0)

No revisions yet.