patternsqlMinor
Table size increased after changing datatype from nchar to varchar
Viewed 0 times
afterdatatypesizeincreasedncharvarcharchangingfromtable
Problem
I am facing a strange behavior when changing datatype from nchar(100) to varchar(100) for one column.
I understand if I change datatype of a column where data is present then it can increase size but in our case initial table is empty and datatype is changed before table is filled with data. Justification for this on our side is that original table is huge so we copied all the data to a temp table by casting to varchar, truncated the original table, changed datatype and copied the data back. To our surprise, size of the data was still much bigger than expected.
If instead of truncating, we drop and recreate the table then size is very small.
Hope someone can help me understand why SQL server does that. I suppose it is due to how SQL Server maintains internal structure of the table but would be good get some explanation.
Here is a sample code to reproduce this behavior on SQL Server 2019.
```
DROP TABLE IF EXISTS dbo.Base;
CREATE TABLE dbo.Base(SomeText varchar(100));
DECLARE @i AS smallint = 1;
WHILE @i <= 10000
BEGIN
INSERT INTO dbo.Base(SomeText)
SELECT @i;
SET @i+=1;
END
DROP TABLE IF EXISTS dbo.Base_NChar;
CREATE TABLE dbo.Base_NChar(SomeText nchar(100));
ALTER TABLE dbo.Base_NChar ALTER COLUMN SomeText varchar(100);
INSERT INTO dbo.Base_NChar(SomeText)
SELECT SomeText
FROM dbo.Base;
DROP TABLE IF EXISTS dbo.Base_New;
CREATE TABLE dbo.Base_New(SomeText varchar(100));
INSERT INTO dbo.Base_New(SomeText)
SELECT SomeText
FROM dbo.Base;
SELECT
t.name AS TableName,
s.name AS SchemaName,
p.rows,
SUM(a.total_pages) * 8 AS TotalSpaceKB,
SUM(a.used_pages) * 8 AS UsedSpaceKB,
(SUM(a.total_pages) - SUM(a.used_pages)) * 8 AS UnusedSpaceKB
FROM
sys.tables t
INNER JOIN sys.indexes i ON t.object_id = i.object_id
INNER JOIN sys.partitions p ON i.object_id = p.object_id AND i.index_id = p.index_id
INNER JOIN sys.allocation_units a ON p.partition_id = a.container_id
LEFT OUTER JOIN sys.schemas s ON
I understand if I change datatype of a column where data is present then it can increase size but in our case initial table is empty and datatype is changed before table is filled with data. Justification for this on our side is that original table is huge so we copied all the data to a temp table by casting to varchar, truncated the original table, changed datatype and copied the data back. To our surprise, size of the data was still much bigger than expected.
If instead of truncating, we drop and recreate the table then size is very small.
Hope someone can help me understand why SQL server does that. I suppose it is due to how SQL Server maintains internal structure of the table but would be good get some explanation.
Here is a sample code to reproduce this behavior on SQL Server 2019.
```
DROP TABLE IF EXISTS dbo.Base;
CREATE TABLE dbo.Base(SomeText varchar(100));
DECLARE @i AS smallint = 1;
WHILE @i <= 10000
BEGIN
INSERT INTO dbo.Base(SomeText)
SELECT @i;
SET @i+=1;
END
DROP TABLE IF EXISTS dbo.Base_NChar;
CREATE TABLE dbo.Base_NChar(SomeText nchar(100));
ALTER TABLE dbo.Base_NChar ALTER COLUMN SomeText varchar(100);
INSERT INTO dbo.Base_NChar(SomeText)
SELECT SomeText
FROM dbo.Base;
DROP TABLE IF EXISTS dbo.Base_New;
CREATE TABLE dbo.Base_New(SomeText varchar(100));
INSERT INTO dbo.Base_New(SomeText)
SELECT SomeText
FROM dbo.Base;
SELECT
t.name AS TableName,
s.name AS SchemaName,
p.rows,
SUM(a.total_pages) * 8 AS TotalSpaceKB,
SUM(a.used_pages) * 8 AS UsedSpaceKB,
(SUM(a.total_pages) - SUM(a.used_pages)) * 8 AS UnusedSpaceKB
FROM
sys.tables t
INNER JOIN sys.indexes i ON t.object_id = i.object_id
INNER JOIN sys.partitions p ON i.object_id = p.object_id AND i.index_id = p.index_id
INNER JOIN sys.allocation_units a ON p.partition_id = a.container_id
LEFT OUTER JOIN sys.schemas s ON
Solution
TL;DR, a dropped fixed length column remains in table meta-data until the table is rebuilt or recreated, even if the table was empty when altered. New rows will contain the dropped fixed length column artifact until the table is rebuilt or recreated. SQL Server does not currently have an optimization to cleanup column meta-data when an empty table is altered.
One can observe the internals of this behavior with the help of undocumented internal system tables and
table_name
column_name
column_id
name
max_length
is_dropped
Base_NChar
NULL
67108865
nchar
200
1
Base_NChar
SomeText
1
varchar
100
0
As you can see, the
Table
Table
Table
Justification for this on our side is that original table is huge so
we copied all the data to a temp table by casting to varchar,
truncated the original table, changed datatype and copied the data
back.
Be aware of another gotcha with the above technique. Simply casting the
Below is the script I used to get this information, including a modified version of a
```
--utility proc to dump pages
CREATE OR ALTER PROC #usp_DumpPages
@ObjectName nvarchar(247) --name of table or indexed view
, @IndexName sysname --name of index or NULL for heap
, @PartitionNumber int = NULL --partition number or NULL for all partitions
, @dbcc_page_print_option int = NULL --
, @page_count_limit int = 1
AS
SET NOCOUNT, XACT_ABORT ON;
DECLARE
@database_id int = DB_ID()
, @object_id int = OBJECT_ID(@ObjectName)
, @index_id int = CASE WHEN @I
One can observe the internals of this behavior with the help of undocumented internal system tables and
DBCC PAGE.CREATE TABLE dbo.Base_NChar(SomeText nchar(100));
ALTER TABLE dbo.Base_NChar ALTER COLUMN SomeText varchar(100);
SELECT
t.name AS table_name
, c.name AS column_name
, ipc.partition_column_id AS column_id
, ty.name
, ipc.max_length
, ipc.is_dropped
FROM sys.tables AS t
JOIN sys.system_internals_partitions ip ON ip.object_id = t.object_id
JOIN sys.system_internals_partition_columns ipc on ipc.partition_id = ip.partition_id
JOIN sys.types ty ON ty.system_type_id = ipc.system_type_id
LEFT JOIN sys.columns AS c ON c.object_id = t.object_id AND ipc.partition_column_id = c.column_id
WHERE ip.object_id = object_id('dbo.Base_NChar');table_name
column_name
column_id
name
max_length
is_dropped
Base_NChar
NULL
67108865
nchar
200
1
Base_NChar
SomeText
1
varchar
100
0
As you can see, the
nchar column still exists in meta-data. Furthermore, the dropped column is still present in newly inserted rows. Here are snippets of a dump of the first row (slot 0) of the first page for each table in your question:Table
dbo.Base shows the 1 byte Column 1 value of the varchar(100) column:Slot 0 Offset 0x60 Length 12
Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP VARIABLE_COLUMNS
Record Size = 12
Memory Dump @0x000000B45A3F0060
0000000000000000: 30000400 01000001 000c0031 0..........1
Slot 0 Column 1 Offset 0xb Length 1 Length (physical) 1
SomeText = 1Table
dbo.Base_NNChar shows the 200 byte dropped nchar column (Column 67108865) and the 1 byte varchar(100) Column 1:Slot 0 Offset 0x60 Length 212
Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP VARIABLE_COLUMNS
Record Size = 212
Memory Dump @0x000000B45A3F0060
0000000000000000: 3000cc00 00000000 c4d86da1 fa7f0000 ffffffff 0.Ì.....ÄØm¡ú...ÿÿÿÿ
0000000000000014: ffffffff ff658e9d fa7f0000 6ae760cc 4c020000 ÿÿÿÿÿeŽ.ú...jç`ÌL...
0000000000000028: 00000000 00000000 01000000 b4000000 ffffffff ............´...ÿÿÿÿ
000000000000003C: ffffffff feffffff ffffffff 68e0a26f 4b020000 ÿÿÿÿþÿÿÿÿÿÿÿhà¢oK...
0000000000000050: d099f3cb 4c020000 f1907aa1 fa7f0000 1099f3cb Ð.óËL...ñ.z¡ú.....óË
0000000000000064: 4c020000 00000000 00000000 306f3f5a b4000000 L...........0o?Z´...
0000000000000078: c4d86da1 fa7f0000 d099f3cb 4c020000 14887aa1 ÄØm¡ú...Ð.óËL.....z¡
000000000000008C: fa7f0000 10e760cc 4c020000 d099f3cb 4c020000 ú....ç`ÌL...Ð.óËL...
00000000000000A0: c0e760cc 4c020000 9086f3cb 4c020000 0089f3cb Àç`ÌL....†óËL....‰óË
00000000000000B4: 4c020000 00000000 00000000 10000000 00000000 L...................
00000000000000C8: 5087f3cb 02000001 00d40031 P‡óË.....Ô.1
Slot 0 Column 67108865 Offset 0x4 Length 0 Length (physical) 200
DROPPED = NULL
Slot 0 Column 1 Offset 0xd3 Length 1 Length (physical) 1
SomeText = 1Table
dbo.Base_New shows the 1 byte Column 1 value of the varchar(100) column, just like dbo.Base:Slot 0 Offset 0x60 Length 12
Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP VARIABLE_COLUMNS
Record Size = 12
Memory Dump @0x000000B4607F0060
0000000000000000: 30000400 01000001 000c0031 0..........1
Slot 0 Column 1 Offset 0xb Length 1 Length (physical) 1
SomeText = 1Justification for this on our side is that original table is huge so
we copied all the data to a temp table by casting to varchar,
truncated the original table, changed datatype and copied the data
back.
Be aware of another gotcha with the above technique. Simply casting the
nchar(100) value to varchar(100) will retain the trailing spaces from the nchar value. You'll also need to trim the trailing spaces to avoid the padding the varchar value: RTRIM(CAST(SomeText AS varchar(100)))Below is the script I used to get this information, including a modified version of a
DBCC PAGE utility proc I use in some of my SQL presentations.```
--utility proc to dump pages
CREATE OR ALTER PROC #usp_DumpPages
@ObjectName nvarchar(247) --name of table or indexed view
, @IndexName sysname --name of index or NULL for heap
, @PartitionNumber int = NULL --partition number or NULL for all partitions
, @dbcc_page_print_option int = NULL --
, @page_count_limit int = 1
AS
SET NOCOUNT, XACT_ABORT ON;
DECLARE
@database_id int = DB_ID()
, @object_id int = OBJECT_ID(@ObjectName)
, @index_id int = CASE WHEN @I
Code Snippets
CREATE TABLE dbo.Base_NChar(SomeText nchar(100));
ALTER TABLE dbo.Base_NChar ALTER COLUMN SomeText varchar(100);
SELECT
t.name AS table_name
, c.name AS column_name
, ipc.partition_column_id AS column_id
, ty.name
, ipc.max_length
, ipc.is_dropped
FROM sys.tables AS t
JOIN sys.system_internals_partitions ip ON ip.object_id = t.object_id
JOIN sys.system_internals_partition_columns ipc on ipc.partition_id = ip.partition_id
JOIN sys.types ty ON ty.system_type_id = ipc.system_type_id
LEFT JOIN sys.columns AS c ON c.object_id = t.object_id AND ipc.partition_column_id = c.column_id
WHERE ip.object_id = object_id('dbo.Base_NChar');Slot 0 Offset 0x60 Length 12
Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP VARIABLE_COLUMNS
Record Size = 12
Memory Dump @0x000000B45A3F0060
0000000000000000: 30000400 01000001 000c0031 0..........1
Slot 0 Column 1 Offset 0xb Length 1 Length (physical) 1
SomeText = 1Slot 0 Offset 0x60 Length 212
Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP VARIABLE_COLUMNS
Record Size = 212
Memory Dump @0x000000B45A3F0060
0000000000000000: 3000cc00 00000000 c4d86da1 fa7f0000 ffffffff 0.Ì.....ÄØm¡ú...ÿÿÿÿ
0000000000000014: ffffffff ff658e9d fa7f0000 6ae760cc 4c020000 ÿÿÿÿÿeŽ.ú...jç`ÌL...
0000000000000028: 00000000 00000000 01000000 b4000000 ffffffff ............´...ÿÿÿÿ
000000000000003C: ffffffff feffffff ffffffff 68e0a26f 4b020000 ÿÿÿÿþÿÿÿÿÿÿÿhà¢oK...
0000000000000050: d099f3cb 4c020000 f1907aa1 fa7f0000 1099f3cb Ð.óËL...ñ.z¡ú.....óË
0000000000000064: 4c020000 00000000 00000000 306f3f5a b4000000 L...........0o?Z´...
0000000000000078: c4d86da1 fa7f0000 d099f3cb 4c020000 14887aa1 ÄØm¡ú...Ð.óËL.....z¡
000000000000008C: fa7f0000 10e760cc 4c020000 d099f3cb 4c020000 ú....ç`ÌL...Ð.óËL...
00000000000000A0: c0e760cc 4c020000 9086f3cb 4c020000 0089f3cb Àç`ÌL....†óËL....‰óË
00000000000000B4: 4c020000 00000000 00000000 10000000 00000000 L...................
00000000000000C8: 5087f3cb 02000001 00d40031 P‡óË.....Ô.1
Slot 0 Column 67108865 Offset 0x4 Length 0 Length (physical) 200
DROPPED = NULL
Slot 0 Column 1 Offset 0xd3 Length 1 Length (physical) 1
SomeText = 1Slot 0 Offset 0x60 Length 12
Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP VARIABLE_COLUMNS
Record Size = 12
Memory Dump @0x000000B4607F0060
0000000000000000: 30000400 01000001 000c0031 0..........1
Slot 0 Column 1 Offset 0xb Length 1 Length (physical) 1
SomeText = 1--utility proc to dump pages
CREATE OR ALTER PROC #usp_DumpPages
@ObjectName nvarchar(247) --name of table or indexed view
, @IndexName sysname --name of index or NULL for heap
, @PartitionNumber int = NULL --partition number or NULL for all partitions
, @dbcc_page_print_option int = NULL --
, @page_count_limit int = 1
AS
SET NOCOUNT, XACT_ABORT ON;
DECLARE
@database_id int = DB_ID()
, @object_id int = OBJECT_ID(@ObjectName)
, @index_id int = CASE WHEN @IndexName IS NULL THEN 0 ELSE INDEXPROPERTY(OBJECT_ID(@ObjectName), @IndexName, 'IndexId') END --dump heap data pages when index name is null
, @allocated_page_file_id int
, @allocated_page_page_id int
, @page_level int
, @page_sequence int
, @page_count int = 0;
BEGIN TRY
IF @object_id IS NULL
BEGIN
RAISERROR('Specified object (%s) not found', 16, 0, @ObjectName);
END;
IF NOT EXISTS(SELECT * FROM sys.indexes WHERE object_id = @object_id AND index_id = @index_id)
BEGIN
RAISERROR('Specified index (%s) not found', 16, 0, @IndexName);
END;
--get data/index pages in logical order
DECLARE pages CURSOR LOCAL FAST_FORWARD FOR
WITH
pages AS (
SELECT
allocated_page_file_id
, allocated_page_page_id
, previous_page_file_id
, previous_page_page_id
, next_page_file_id
, next_page_page_id
, page_level
, 0 AS page_sequence
FROM sys.dm_db_database_page_allocations(@database_id, @object_id, @index_id, @PartitionNumber, 'DETAILED')
WHERE
page_type_desc IN ( 'DATA_PAGE', 'INDEX_PAGE' )
)
, chained_pages AS (
SELECT
allocated_page_file_id
, allocated_page_page_id
, next_page_file_id
, next_page_page_id
, page_level
, 1 AS page_sequence
FROM pages
WHERE
previous_page_page_id IS NULL --first page of each level
UNION ALL
SELECT
next_page.allocated_page_file_id
, next_page.allocated_page_page_id
, next_page.next_page_file_id
, next_page.next_page_page_id
, next_page.page_level
, chained_pages.page_sequence + 1 AS page_sequence
FROM pages AS next_page
JOIN chained_pages ON
next_page.allocated_page_file_id = chained_pages.next_page_file_id
AND next_page.allocated_page_page_id = chained_pages.next_page_page_id
)
SELECT
allocated_page_file_id
, allocated_page_page_id
, page_level
, page_sequence
FROM chained_pages
ORDER BY page_sequence
OPTION(MAXRECURSION 1000);
--return DBCC results to client
DBCC TRACEON(3604) WContext
StackExchange Database Administrators Q#332051, answer score: 3
Revisions (0)
No revisions yet.