patternsqlMinor
Can I bulk insert into an empty page-compressed table and get full compression?
Viewed 0 times
canfullinsertintocompressionemptybulkgetpageand
Problem
I have a lot of large tables (around 10 million wide rows) which need to be regularly loaded into SQL Server 2016 for read-only reporting. I would like these tables to be as small as possible on disk, and this matters more than performance improvements in either loading or querying.
Here is what I have been doing for the tables which require no further indexing:
Column types in the tables are varchar (never more than 512, not max), float, tinyint, or date (not datetime). All columns are created as nullable and no primary or foreign keys are defined -- they don't matter for the querying and the tables are never updated directly. Default collation on everything is
When I do this, I can see in
However, if I then rebuild with the same option
The questions
So my questions are: (a) what is going on here? and (b) is there a way to get this extra-small compressed size directly as I load the table without having to rebuild after the data is loaded?
Here is what I have been doing for the tables which require no further indexing:
- Create the table with
DATA_COMPRESSION=PAGE.
- Use bcp to bulk insert the data from a flat file into the new table.
Column types in the tables are varchar (never more than 512, not max), float, tinyint, or date (not datetime). All columns are created as nullable and no primary or foreign keys are defined -- they don't matter for the querying and the tables are never updated directly. Default collation on everything is
SQL_Latin1_General_CP1_CI_AS.When I do this, I can see in
sys.allocation_units that page data compression has been applied to the heap and I can see in sys.partitions that the fill factor is correctly 0 (100%). Since the tables are much smaller than uncompressed tables would be, I thought the compression was accomplished.However, if I then rebuild with the same option
DATA_COMPRESSION=PAGE, the supposedly-already-compressed table gets about 30% smaller! It looks like it's going from about 17 rows per data page to 25 rows per page. (Only once, though. Rebuilding again after that doesn't make it any smaller than the first rebuild did.) The questions
So my questions are: (a) what is going on here? and (b) is there a way to get this extra-small compressed size directly as I load the table without having to rebuild after the data is loaded?
Solution
@HandyD is entirely correct, I only want to highlight some other methods to get compression while inserting into a heap.
From the same document
When a heap is configured for page-level compression, pages receive
page-level compression only in the following ways:
According to this, you could leverage minimally logged bulk inserts or use
(a) what is going on here? and (b) is there a way to get this
extra-small compressed size directly as I load the table without
having to rebuild after the data is loaded?
There are rules to get
Testing
Example start data & BCP out command
The
The data size is at
The data size without any compression is ~
INSERT INTO ... WITH TABLOCK
Inserting
BULK INSERT
Now when we create a destination heap table that is also
The data gets
BCP IN WITH TABLOCK
You can get the same results as the
With the resulting size being
BCP IN WITHOUT TABLOCK
Using BCP to load data from the same file in a copy of the destination table
And a standard bcp command results into non compressed data:
With the data size at
From the same document
When a heap is configured for page-level compression, pages receive
page-level compression only in the following ways:
- Data is bulk imported with bulk optimizations enabled.
- Data is inserted using INSERT INTO ... WITH (TABLOCK) syntax and the table does not have a nonclustered index.
- A table is rebuilt by executing the ALTER TABLE ... REBUILD statement with the PAGE compression option.
According to this, you could leverage minimally logged bulk inserts or use
INSERT INTO ... WITH (TABLOCK) to get PAGE compression without having to do rebuilds. (a) what is going on here? and (b) is there a way to get this
extra-small compressed size directly as I load the table without
having to rebuild after the data is loaded?
There are rules to get
PAGE compression when inserting into a heap, add -h "TABLOCK" to your bcp command to get compression. ROW compression works without these prerequisites and is the least amount of compression used in below examples, thanks @DenisRubashkin for pointing that out!Testing
Example start data & BCP out command
--Tested on SQL Server 2014 SP2
CREATE TABLE dbo.CompressedHeap_Source( Val varchar(512),
Datefield Date,
Tinyfield TinyINT,
Floatfield float)
WITH (DATA_COMPRESSION = PAGE);
INSERT INTO dbo.CompressedHeap_Source
(
Val,Datefield,Tinyfield,Floatfield)
SELECT 'Bla',cast(getdate() as date),1,1.2412
FROM master..spt_values spt1
CROSS APPLY master..spt_values spt2;
--bcp TEST.dbo.CompressedHeap_Source out E:\Data\HeapData.bcp -c -TThe
ROW compressed and Uncompressed sizeThe data size is at
132272 KB when doing a standard insert into the heap, this is ROW compressed but not PAGE compressed. The data size without any compression is ~
176216 KB for our test.exec sp_spaceused 'dbo.CompressedHeap_Source'
name rows reserved data index_size unused
CompressedHeap_Source 6365530 132296 KB 132272 KB 8 KB 16 KBINSERT INTO ... WITH TABLOCK
Inserting
WITH TABLOCK gives us the PAGE compressed data size, 69480 KB.INSERT INTO dbo.CompressedHeap_Source2 WITH(TABLOCK)
(
Val,Datefield,Tinyfield,Floatfield)
SELECT 'Bla',cast(getdate() as date),1,1.2412
FROM master..spt_values spt1
CROSS APPLY master..spt_values spt2BULK INSERT
Now when we create a destination heap table that is also
page compressed, and do a bulk insert with tablock:CREATE TABLE dbo.CompressedHeap_Destination( Val varchar(512),
Datefield Date,
Tinyfield TinyINT,
Floatfield float)
WITH (DATA_COMPRESSION = PAGE);
bulk insert dbo.CompressedHeap_Destination
from 'E:\Data\HeapData.bcp' with (TABLOCK)The data gets
page compressed and is also at 69480 KB:name rows reserved data index_size unused
CompressedHeap_Destination 6365530 69512 KB 69480 KB 8 KB 24 KBBCP IN WITH TABLOCK
You can get the same results as the
BULK INSERT WITH TABLOCK by using BCP IN with the -h "TABLOCK" hint. This makes sense, they do the same internally--bcp TEST.dbo.CompressedHeap_Destination2 IN E:\Data\HeapData.bcp -c -T -h "TABLOCK"With the resulting size being
69480 KBBCP IN WITHOUT TABLOCK
Using BCP to load data from the same file in a copy of the destination table
And a standard bcp command results into non compressed data:
--bcp TEST.dbo.CompressedHeap_Destination2 IN E:\Data\HeapData.bcp -c -TWith the data size at
132272 KB (row compressed).Code Snippets
--Tested on SQL Server 2014 SP2
CREATE TABLE dbo.CompressedHeap_Source( Val varchar(512),
Datefield Date,
Tinyfield TinyINT,
Floatfield float)
WITH (DATA_COMPRESSION = PAGE);
INSERT INTO dbo.CompressedHeap_Source
(
Val,Datefield,Tinyfield,Floatfield)
SELECT 'Bla',cast(getdate() as date),1,1.2412
FROM master..spt_values spt1
CROSS APPLY master..spt_values spt2;
--bcp TEST.dbo.CompressedHeap_Source out E:\Data\HeapData.bcp -c -Texec sp_spaceused 'dbo.CompressedHeap_Source'
name rows reserved data index_size unused
CompressedHeap_Source 6365530 132296 KB 132272 KB 8 KB 16 KBINSERT INTO dbo.CompressedHeap_Source2 WITH(TABLOCK)
(
Val,Datefield,Tinyfield,Floatfield)
SELECT 'Bla',cast(getdate() as date),1,1.2412
FROM master..spt_values spt1
CROSS APPLY master..spt_values spt2CREATE TABLE dbo.CompressedHeap_Destination( Val varchar(512),
Datefield Date,
Tinyfield TinyINT,
Floatfield float)
WITH (DATA_COMPRESSION = PAGE);
bulk insert dbo.CompressedHeap_Destination
from 'E:\Data\HeapData.bcp' with (TABLOCK)name rows reserved data index_size unused
CompressedHeap_Destination 6365530 69512 KB 69480 KB 8 KB 24 KBContext
StackExchange Database Administrators Q#245178, answer score: 8
Revisions (0)
No revisions yet.