patternsqlMinor
Clustered columnstore index space usage
Viewed 0 times
clusteredspacecolumnstoreusageindex
Problem
I have a simple table with a clustered columnstore index:
This table has some billion rows and according to
My problem is I can't account for all of this space. Querying
The tuple mover has done its job and all row groups are in the compressed state. I already tried an
My only idea for the size difference is some dictionary-like stuff I'm not accounting for. Any ideas on what I might be missing?
I'm running SQL Server 2017 (RTM-CU4).
EDIT 1:
This is the output from sp_spaceused for the table in question:
EDIT 2:
This is a repro script with 1 million rows. It runs in about 1 minute on my machine. warning: it drops and recreates a new database
```
USE master;
GO
DROP DATABASE IF EXISTS MyDbWeirdTest;
GO
CREATE DATABASE MyDbWeirdTest;
GO
USE MyDbWeirdTest;
GO
CREATE TABLE IdsBin (
ID INT NOT NULL,
Hash BINARY(20) NOT NULL
);
CREATE CLUSTERED COLUMNSTORE INDEX ix1 ON IdsBin
GO
CREATE TYPE tBin AS TABLE (
ID INT,
Hash BINARY(20)
);
GO
CREATE OR ALTER PROCEDURE pBin (
@ids AS dbo.tBin READONLY
)
AS
BEGIN
SET NOCOUNT ON;
INSERT dbo.IdsBin
SELECT ID, Hash FROM @ids;
END;
GO
SET NOCOUNT ON;
DECLARE @i INT = 1, @t INT = 1;
ID INT NOT NULL,
Hash BINARY(20) NOT NULLThis table has some billion rows and according to
sp_spaceused, sys.allocation_units and SSMS reports, its size is about 25GB.My problem is I can't account for all of this space. Querying
sys.column_store_row_groups and sys.column_store_segments only gives me about 7,8GB. The index uses no dictionary: primary_dictionary_id and secondary_dictionary_id are -1 for all segments. Querying sys.column_store_dictionaries returns no rows at all.The tuple mover has done its job and all row groups are in the compressed state. I already tried an
ALTER INDEX REORGANIZE just in case.My only idea for the size difference is some dictionary-like stuff I'm not accounting for. Any ideas on what I might be missing?
I'm running SQL Server 2017 (RTM-CU4).
EDIT 1:
This is the output from sp_spaceused for the table in question:
+--------+------------+-------------+-------------+------------+----------+
| name | rows | reserved | data | index_size | unused |
+--------+------------+-------------+-------------+------------+----------+
| IdsBin | 1073741824 | 25028112 KB | 25007432 KB | 16 KB | 20664 KB |
+--------+------------+-------------+-------------+------------+----------+EDIT 2:
This is a repro script with 1 million rows. It runs in about 1 minute on my machine. warning: it drops and recreates a new database
```
USE master;
GO
DROP DATABASE IF EXISTS MyDbWeirdTest;
GO
CREATE DATABASE MyDbWeirdTest;
GO
USE MyDbWeirdTest;
GO
CREATE TABLE IdsBin (
ID INT NOT NULL,
Hash BINARY(20) NOT NULL
);
CREATE CLUSTERED COLUMNSTORE INDEX ix1 ON IdsBin
GO
CREATE TYPE tBin AS TABLE (
ID INT,
Hash BINARY(20)
);
GO
CREATE OR ALTER PROCEDURE pBin (
@ids AS dbo.tBin READONLY
)
AS
BEGIN
SET NOCOUNT ON;
INSERT dbo.IdsBin
SELECT ID, Hash FROM @ids;
END;
GO
SET NOCOUNT ON;
DECLARE @i INT = 1, @t INT = 1;
Solution
sys.column_store_segments and sys.column_store_row_groups store some of the metadata information about the ColumnStore data, but I believe that ends up representing the compressed size. There are LOB structures that are allocated as well, and you can see the uncompressed size in the allocation unit / partition DMVs (and if you get to the pages somehow with DBCC PAGE, I bet you would see that they're relatively empty). In other words, sys.column_store_row_groups shows you how much data is stored on those pages, but doesn't add in the free space on those pages (which still take up space in the data file and in memory, like a fragmented index or an index with a really low fill factor).I ran your repro and here is what I saw:
SELECT
a.[type_desc],
p.[rows],
a.total_pages, reserved_kb = a.total_pages * 8,
a.used_pages, data_kb = a.used_pages * 8
FROM sys.allocation_units AS a
INNER JOIN sys.partitions AS p
ON a.container_id = p.[partition_id]
WHERE p.[object_id] = OBJECT_ID(N'dbo.IdsBin');Results:
Other than unused, which I didn't bother calculating, the results match up with
sp_spaceused:And you can confirm these numbers in
sys.dm_db_partition_stats as well:SELECT
lob_reserved_page_count, reserved_kb = lob_reserved_page_count * 8,
lob_used_page_count, data_kb = lob_used_page_count * 8
FROM sys.dm_db_partition_stats
WHERE [object_id] = OBJECT_ID(N'dbo.IdsBin');Results:
You mentioned this but I just wanted to call out explicitly that
sp_spaceused and allocation_units reflect the page count, regardless of how full or empty any page might be. The row groups DMV is reflecting just the actual data. The documentation states (emphasis mine):Size in bytes of all the data in this row group (not including metadata or shared dictionaries)
Whereas
sys.dm_db_partition_stats, for example, explicitly states pages throughout, not data, although I would argue that they should specify that each LOB is an 8K page here:Total number of LOBs used to store and manage columnstore index in the partition.
Which number you want to trust, well, that's up to you.
As an aside, Niko Neugebauer talked about the fact that only the compressed size of the dictionaries is exposed in the columnstore DMVs here, and raised a feedback item about it. Seems to me there is other information that could be exposed in the columnstore DMVs too.
Code Snippets
SELECT
a.[type_desc],
p.[rows],
a.total_pages, reserved_kb = a.total_pages * 8,
a.used_pages, data_kb = a.used_pages * 8
FROM sys.allocation_units AS a
INNER JOIN sys.partitions AS p
ON a.container_id = p.[partition_id]
WHERE p.[object_id] = OBJECT_ID(N'dbo.IdsBin');SELECT
lob_reserved_page_count, reserved_kb = lob_reserved_page_count * 8,
lob_used_page_count, data_kb = lob_used_page_count * 8
FROM sys.dm_db_partition_stats
WHERE [object_id] = OBJECT_ID(N'dbo.IdsBin');Context
StackExchange Database Administrators Q#202450, answer score: 3
Revisions (0)
No revisions yet.