patternsqlModerate
Are duplicated NVARCHAR values stored as copies in SQL Server?
Viewed 0 times
storednvarcharcopiesaresqlserverduplicatedvalues
Problem
I'm designing a table that will contain a lot of rows. So need to be careful not to store to much information. One of the columns is a NVARCHAR(MAX) column and it contains the address of our customers. As addresses do not change often, this column will contain many repeated values and thus contains quite some redundancy.
So I was wondering if I need to normalize this myself by maintaining some sort of look-up table to address strings (note that if an address changes I need to maintain history - so it's not a matter of usual normalization), or if SQL Server is pointing to the same reference of the string behind the scenes. Or maybe it offers a column option to do so. Another approach that came into my mind is to use COMPRESS but I guess this does not make sense as the data itself (i.e. the address) is not long.
Reading/writing performance is not so much of a concern as the data will be accumulated over time.
So I was wondering if I need to normalize this myself by maintaining some sort of look-up table to address strings (note that if an address changes I need to maintain history - so it's not a matter of usual normalization), or if SQL Server is pointing to the same reference of the string behind the scenes. Or maybe it offers a column option to do so. Another approach that came into my mind is to use COMPRESS but I guess this does not make sense as the data itself (i.e. the address) is not long.
Reading/writing performance is not so much of a concern as the data will be accumulated over time.
Solution
Under "normal" conditions, no, data in
Using one of the Data Compression options is probably your best bet. Here are some things to consider:
Since the data won't really be changing, you should look into the Columnstore Index options (also available in Azure SQL Database):
Columnstore compression should be better than Page compression.
Also, you should probably avoid using the
For more details on working with character data, please see the following post of mine:
How Many Bytes Per Character in SQL Server: a Completely Complete Guide
VARCHAR and NVARCHAR columns is not de-duped (although duplicate attribute and/or element names in a single XML value are reduced to a unique instance).Using one of the Data Compression options is probably your best bet. Here are some things to consider:
- Unicode Compression (part of Row Compression) only works on
NVARCHAR(1 - 4000), notNVARCHAR(MAX)(please vote for / support: Unicode compression NVARCHAR(MAX)).
- Page Compression can work with
NVARCHAR(MAX), but only for in-row data. Off-row data (LOB pages) is not compressed.
Since the data won't really be changing, you should look into the Columnstore Index options (also available in Azure SQL Database):
- Columnstore indexes: Overview
- Columnstore indexes - Design guidance
Columnstore compression should be better than Page compression.
Also, you should probably avoid using the
SPARSE option, due to:- The
SPARSEoption offers no benefit compared to Data Compression.
- It is mainly intended for columns sets / wide tables (i.e. up to 30,000 columns).
- It mostly helps with fixed-length datatypes (e.g.
INT,DATETIME). So, prior to Data Compression being available, it was useful forCHARandNCHAR, but not forVARCHARorNVARCHARas they don't take up space whenNULL.
- It only benefits columns set to
NULL.
- It slightly hurts non-
NULLvalues by adding 2 bytes to each one.
- You should probably have
NULLfor 50% - 60% of the rows in order to get enough savings for it to be worth using this option.
For more details on working with character data, please see the following post of mine:
How Many Bytes Per Character in SQL Server: a Completely Complete Guide
Context
StackExchange Database Administrators Q#279876, answer score: 12
Revisions (0)
No revisions yet.