patternsqlMinor
nvarchar(max) converting to varchar and table optimization
Viewed 0 times
nvarcharvarcharoptimizationmaxandconvertingtable
Problem
I am working with a table that has all character types set to
A large overhaul of this table that consists literally 140 (
My question is in what situations is
Only when you expect to have 4k or more characters?
What should I learn and prepare for when doing this?
Will this improve performance when a clustered index update that affects the clustered key has to update the all non-clustered indexes?
We are having update procedures timing out that are using 75% to 95% of the query execution plan & displayed plan f
nvarchar some of them are nvarchar(max). We are converting all these to varchar and specifying a character width based upon the actual usage in production. The production data uses a range of 2 characters up to 900 characters of actual used width for any given column. We are going to add padding of 10% when applicable.-- Insert statements for procedure here
UPDATE Listings WITH (ROWLOCK)
SET [SubType] = 'S'
WHERE @idSettings = idSettings AND
(@idRetsClass = 0 OR idRetsClass = @idRetsClass)
AND (@idRetsSetting = 0 OR idRetsSetting = @idRetsSetting)
AND IsNew = 1 AND ([SubType] LIKE '%Single Family Home%' OR [SubType] LIKE '%Modular%' OR [SubType] LIKE '%Mobile Home%'
OR [SubType] LIKE '% Story%' OR [SubType] = '' OR [SubType] = 'residential - S' OR [SubType] = '1 House on Lot' OR [SubType] = '2 Houses on Lot'
OR [SubType] = 'Detached' OR [SubType] LIKE '%single family%' OR [SubType] = 'ranch' OR [SubType] = 'Semi-Detached' OR [SubType] = 'single' OR [SubType] = 'one family' OR [SubType] = 'Residential'
OR [SubType] = 'Ranch Type' OR [SubType] = '2 or More Stories' OR [SubType] = 'Cape Cod' OR [SubType] = 'Split Level' OR [SubType] = 'Bi-Level' OR [SubType] = 'Detached Single'
OR [SubType] = 'Single-Family Homes' OR [SubType] = 'house' OR [SubType] = 'detached housing' OR [SubType] = 'det')A large overhaul of this table that consists literally 140 (
nvarchar) columns, 11 being MAX. I am dropping 30 indexes and recreating them afterwards.My question is in what situations is
varchar(max) preferred?Only when you expect to have 4k or more characters?
What should I learn and prepare for when doing this?
Will this improve performance when a clustered index update that affects the clustered key has to update the all non-clustered indexes?
We are having update procedures timing out that are using 75% to 95% of the query execution plan & displayed plan f
Solution
{ This is a little lengthy, perhaps, but your actual problem(s) cannot be solved by looking at execution plans. There are two main issues, and both are architectural. }
Distractions
Let's start with the items that are not your major problem areas. These are things that should be looked into as it definitely helps improve performance to use the datatypes that you need and not just a general, one-size-fits-most datatype. There is a very good reason why the different datatypes exist, and if storing 100 characters in
to MAX, or not to MAX
I am working with a table that has all character types set to
Ok. This is not necessarily a bad thing, though most often there is at least one field that is of a numeric type as an ID. However, there are certainly valid cases for the scenario described so far. And there is nothing inherently bad about
VARCHAR vs NVARCHAR
We are converting all these to
Ok, but why exactly (yes, I know that info and comments that follow this statement add clarity, but I am going in order to preserve the conversational aspect for a reason). Each datatype has its place.
Please note that it is possible for
Prior to converting to
To clarify how
Another thing to consider regarding space taken up by
Real Areas of Concern
The following areas should be a
Distractions
Let's start with the items that are not your major problem areas. These are things that should be looked into as it definitely helps improve performance to use the datatypes that you need and not just a general, one-size-fits-most datatype. There is a very good reason why the different datatypes exist, and if storing 100 characters in
NVARCHAR(MAX) had no negative impact on queries (or any other aspect of the system), then everything would be stored as NVARCHAR(MAX). However, cleaning up these areas won't lead to true scalamability.to MAX, or not to MAX
I am working with a table that has all character types set to
nvarchar some of them are nvarchar(max).Ok. This is not necessarily a bad thing, though most often there is at least one field that is of a numeric type as an ID. However, there are certainly valid cases for the scenario described so far. And there is nothing inherently bad about
MAX fields since they will store the data on the data page (i.e. in row) if the data can fit there. And in that situation it should perform as well as a non-MAX value of that same datatype. But yes, a bunch of MAX type fields is a sign of sloppiness in the data modeling and is far more likely to have most (or all) of that MAX data stored in separate data pages (i.e. off row) that need an extra lookup, hence less efficient.VARCHAR vs NVARCHAR
We are converting all these to
varchar...Ok, but why exactly (yes, I know that info and comments that follow this statement add clarity, but I am going in order to preserve the conversational aspect for a reason). Each datatype has its place.
VARCHAR is 1 byte per character and can represent 256 characters (most of the time) as defined on a single code page. While character values 0 - 127 are the same between code pages, character values between 128 and 255 can change:;WITH chars ([SampleCharacters]) AS
(
SELECT CHAR(42) + ' ' -- *
+ CHAR(65) + ' ' -- A
+ CHAR(126) + ' ' --
-------------------------------
+ CHAR(128) + ' ' -- €
+ CHAR(149) + ' ' -- •
+ CHAR(165) + ' ' -- ¥, Y, ?
+ CHAR(183) + ' ' -- ·, ?
+ CHAR(229) + ' ' -- å, a, ?
)
SELECT chr.SampleCharacters COLLATE SQL_Latin1_General_CP1_CI_AS AS [SQL_Latin1_General_CP1_CI_AS],
chr.SampleCharacters COLLATE SQL_Latin1_General_CP1255_CI_AS AS [SQL_Latin1_General_CP1255_CI_AS],
chr.SampleCharacters COLLATE Thai_CI_AS_KS_WS AS [Thai_CI_AS_KS_WS],
chr.SampleCharacters COLLATE Yakut_100_CS_AS_KS AS [Yakut_100_CS_AS_KS],
chr.SampleCharacters COLLATE Albanian_CS_AI AS [Albanian_CS_AI]
FROM chars chr;
Please note that it is possible for
VARCHAR data to take up 2 bytes per character and represent more than 256 characters. For more information on Double-Byte Characters Sets, please see the following answer: Storing Japanese characters in a table .NVARCHAR is stored as UTF-16 (Little Endian) and is either 2 or 4 bytes per character, which can represent the full Unicode spectrum. So, if your data will need to ever store more characters than can be represented by a single code page, then switching to VARCHAR won't truly help you.Prior to converting to
VARCHAR, you need to make sure that you are not storing any Unicode characters. Try the following query to see if there are any rows that cannot be converted to VARCHAR without losing data:SELECT tbl.PKfield, tbl.SubType
FROM dbo.[Listings] tbl
WHERE tbl.SubType <> CONVERT(NVARCHAR(MAX), CONVERT(VARCHAR(MAX), tbl.SubType))
To clarify how
NVARCHAR works: the max length of an NVARCHAR field is the number of 2-byte characters. Hence, NVARCHAR(50), will allow for a maximum of 100 bytes. How many characters will fit into that 100 bytes depends on how many 4 byte characters there are: none will allow you to fit in all 50 characters, all characters being 4-bytes will only fit 25 characters, and many combinations between.Another thing to consider regarding space taken up by
VARCHAR vs NVARCHAR: starting in SQL Server 2008 (Enterprise and Developer editions only!) you can enable Row or Page compression on Tables, Indexes, and Indexed Views. For situations where much of the data within an NVARCHAR field can actually fit within VARCHAR without any data loss, compression will allow for characters that do fit into VARCHAR to be stored as 1 byte. And only characters that require either 2 or 4 bytes will take up that space. This should remove one of the larger reasons that people often choose to stick with VARCHAR. For more info on Compression, please see the MSDN page for Creating Compressed Tables and Indexes. Please note that data in MAX datatypes that is being stored off-row is not compressible.Real Areas of Concern
The following areas should be a
Context
StackExchange Database Administrators Q#106632, answer score: 9
Revisions (0)
No revisions yet.