patternsqlModerate
When `nvarchar/nchar` is going to be used with SQL Server 2019?
Viewed 0 times
goingnvarcharwithusedsql2019ncharwhenserver
Problem
With SQL Server 2019 Microsoft introduces UTF-8 support for
This feature may provide significant storage savings, depending on the
character set in use. For example, changing an existing column data
type with ASCII strings from NCHAR(10) to CHAR(10) using an UTF-8
enabled collation, translates into nearly 50% reduction in storage
requirements. This reduction is because NCHAR(10) requires 22 bytes
for storage, whereas CHAR(10) requires 12 bytes for the same Unicode
string.
UTF-8 seems to support every script, so basically we can start storing Unicode data in
I am wondering does this mean we can stop to use
Can anyone point a scenario and reason, not to use the char data types with
CHAR and VARCHAR data types and says:This feature may provide significant storage savings, depending on the
character set in use. For example, changing an existing column data
type with ASCII strings from NCHAR(10) to CHAR(10) using an UTF-8
enabled collation, translates into nearly 50% reduction in storage
requirements. This reduction is because NCHAR(10) requires 22 bytes
for storage, whereas CHAR(10) requires 12 bytes for the same Unicode
string.
UTF-8 seems to support every script, so basically we can start storing Unicode data in
varchar and char columns. And as is said in the documentation, this can reduce the size of tables and indexes, and from there we can get even better performance, because smaller amount of data is read.I am wondering does this mean we can stop to use
nvarchar and nchar columns which implements UTF-16?Can anyone point a scenario and reason, not to use the char data types with
UTF encoding and continue use the n-chars ones?Solution
UTF-8 support gives you a new set of options. Potential space savings (without row or page compression) is one consideration, but the choice of type and encoding should probably be primarily made on the basis of actual requirements for comparison, sorting, data import, and export.
You may need to change more than you think, since e.g. an
For example:
gives the familiar error:
Msg 8152, Level 16, State 30, Line xxx
String or binary data would be truncated.
Or if trace flag 460 is active:
Msg 2628, Level 16, State 1, Line xxx
String or binary data would be truncated in table '@T', column 'UTF8'. Truncated value: ' '.
Expanding the UTF8 column to
However, if it was e.g.
Note also that the UTF-8 collations all use supplementary characters, so will not work with replication.
Aside from anything else, UTF-8 support is only in preview at this time, so not available for production use.
You may need to change more than you think, since e.g. an
nchar(1) type provides two bytes of storage. That is enough to store any character in BMP (code points 000000 to 00FFFF). Some of the characters in that range would be encoded with just 1 byte in UTF-8 while others would require 2 or even 3 bytes (see this comparison chart for more details). Therefore, ensuring coverage of the same set of characters in UTF-8 would require char(3).For example:
DECLARE @T AS table
(
n integer PRIMARY KEY,
UTF16 nchar(1) COLLATE Latin1_General_CI_AS,
UTF8 char(1) COLLATE Latin1_General_100_CI_AS_SC_UTF8
);
INSERT @T (n, UTF16, UTF8)
SELECT 911, NCHAR(911), NCHAR(911);gives the familiar error:
Msg 8152, Level 16, State 30, Line xxx
String or binary data would be truncated.
Or if trace flag 460 is active:
Msg 2628, Level 16, State 1, Line xxx
String or binary data would be truncated in table '@T', column 'UTF8'. Truncated value: ' '.
Expanding the UTF8 column to
char(2) or varchar(2) resolves the error for NCHAR(911):DECLARE @T AS table
(
n integer PRIMARY KEY,
UTF16 nchar(1) COLLATE Latin1_General_CI_AS,
UTF8 varchar(2) COLLATE Latin1_General_100_CI_AS_SC_UTF8
);
INSERT @T (n, UTF16, UTF8)
SELECT 911, NCHAR(911), NCHAR(911);However, if it was e.g.
NCHAR(8364), you would need to expand the column further, to char(3) or varchar(3).Note also that the UTF-8 collations all use supplementary characters, so will not work with replication.
Aside from anything else, UTF-8 support is only in preview at this time, so not available for production use.
Code Snippets
DECLARE @T AS table
(
n integer PRIMARY KEY,
UTF16 nchar(1) COLLATE Latin1_General_CI_AS,
UTF8 char(1) COLLATE Latin1_General_100_CI_AS_SC_UTF8
);
INSERT @T (n, UTF16, UTF8)
SELECT 911, NCHAR(911), NCHAR(911);DECLARE @T AS table
(
n integer PRIMARY KEY,
UTF16 nchar(1) COLLATE Latin1_General_CI_AS,
UTF8 varchar(2) COLLATE Latin1_General_100_CI_AS_SC_UTF8
);
INSERT @T (n, UTF16, UTF8)
SELECT 911, NCHAR(911), NCHAR(911);Context
StackExchange Database Administrators Q#218479, answer score: 13
Revisions (0)
No revisions yet.