gotchasqlModerate
Why does the varchar datatype allow unicode values?
Viewed 0 times
whythedatatypevarcharunicodeallowdoesvalues
Problem
I have a table with a varchar column. It is allowing Trademark(™), copyright(©) and other Unicode characters as shown below.
But the definition of varchar says, it allows non-unicode string data. But the Trademark(™) and Registered(®) symbols are Unicode characters. Does the definition contradicts the property of varchar datatype? I read couple of links like first one and second one. But still I could not understand why it allows unicode string when the definition says that it allows only non-unicode string values.
Create table VarcharUnicodeCheck
(
col1 varchar(100)
)
insert into VarcharUnicodeCheck (col1) values ('MyCompany')
insert into VarcharUnicodeCheck (col1) values ('MyCompany™')
insert into VarcharUnicodeCheck (col1) values ('MyCompany░')
insert into VarcharUnicodeCheck (col1) values ('MyCompanyï')
insert into VarcharUnicodeCheck (col1) values ('MyCompany')
select * from VarcharUnicodeCheckBut the definition of varchar says, it allows non-unicode string data. But the Trademark(™) and Registered(®) symbols are Unicode characters. Does the definition contradicts the property of varchar datatype? I read couple of links like first one and second one. But still I could not understand why it allows unicode string when the definition says that it allows only non-unicode string values.
Solution
But the Trademark(™) and Registered(®) symbols are Unicode characters.
You are wrong here. Your strings contain only ascii characters.
Here is a simple test that shows you that your characters are all ascii (+ some extended ascii with ascii codes between 128 and 255):
Here you can clearly see that all your characters are 1-byte encoded:
Yes they are not pure ascii characters but they are Extended ASCII.
Here I show you real Unicode character
Finally, you can see that Trademark(™) Unicode character has 8482 code and not 153:
You are wrong here. Your strings contain only ascii characters.
Here is a simple test that shows you that your characters are all ascii (+ some extended ascii with ascii codes between 128 and 255):
declare @VarcharUnicodeCheck table
(
col1 varchar(100)
)
insert into @VarcharUnicodeCheck (col1) values ('MyCompany')
insert into @VarcharUnicodeCheck (col1) values ('MyCompany™')
insert into @VarcharUnicodeCheck (col1) values ('MyCompany░')
insert into @VarcharUnicodeCheck (col1) values ('MyCompanyï')
insert into @VarcharUnicodeCheck (col1) values ('MyCompany')
select *,
right(col1, 1)as last_char,
ascii(right(col1, 1)) as_last_char_ascii
from @VarcharUnicodeCheck;Here you can clearly see that all your characters are 1-byte encoded:
Yes they are not pure ascii characters but they are Extended ASCII.
Here I show you real Unicode character
Trademark(™) and its code and binary representation:declare @t table (uni_ch nchar(1), ascii_ch char(1));
insert into @t values (N'™', '™');
select unicode(uni_ch) as [unicode of ™],
ascii(ascii_ch) [ascii of ™],
cast(uni_ch as varbinary(10)) as [uni_ch as varbinary],
cast(ascii_ch as varbinary(10)) as [ascii_ch as varbinary]
from @t;Finally, you can see that Trademark(™) Unicode character has 8482 code and not 153:
select nchar(8482), nchar(153)Code Snippets
declare @VarcharUnicodeCheck table
(
col1 varchar(100)
)
insert into @VarcharUnicodeCheck (col1) values ('MyCompany')
insert into @VarcharUnicodeCheck (col1) values ('MyCompany™')
insert into @VarcharUnicodeCheck (col1) values ('MyCompany░')
insert into @VarcharUnicodeCheck (col1) values ('MyCompanyï')
insert into @VarcharUnicodeCheck (col1) values ('MyCompany')
select *,
right(col1, 1)as last_char,
ascii(right(col1, 1)) as_last_char_ascii
from @VarcharUnicodeCheck;declare @t table (uni_ch nchar(1), ascii_ch char(1));
insert into @t values (N'™', '™');
select unicode(uni_ch) as [unicode of ™],
ascii(ascii_ch) [ascii of ™],
cast(uni_ch as varbinary(10)) as [uni_ch as varbinary],
cast(ascii_ch as varbinary(10)) as [ascii_ch as varbinary]
from @t;select nchar(8482), nchar(153)Context
StackExchange Database Administrators Q#196570, answer score: 14
Revisions (0)
No revisions yet.