HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Clustered columnstore index space usage

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
clusteredspacecolumnstoreusageindex

Problem

I have a simple table with a clustered columnstore index:

ID INT NOT NULL,
Hash BINARY(20) NOT NULL


This table has some billion rows and according to sp_spaceused, sys.allocation_units and SSMS reports, its size is about 25GB.

My problem is I can't account for all of this space. Querying sys.column_store_row_groups and sys.column_store_segments only gives me about 7,8GB. The index uses no dictionary: primary_dictionary_id and secondary_dictionary_id are -1 for all segments. Querying sys.column_store_dictionaries returns no rows at all.

The tuple mover has done its job and all row groups are in the compressed state. I already tried an ALTER INDEX REORGANIZE just in case.

My only idea for the size difference is some dictionary-like stuff I'm not accounting for. Any ideas on what I might be missing?

I'm running SQL Server 2017 (RTM-CU4).

EDIT 1:

This is the output from sp_spaceused for the table in question:

+--------+------------+-------------+-------------+------------+----------+
|  name  |    rows    |  reserved   |    data     | index_size |  unused  |
+--------+------------+-------------+-------------+------------+----------+
| IdsBin | 1073741824 | 25028112 KB | 25007432 KB | 16 KB      | 20664 KB |
+--------+------------+-------------+-------------+------------+----------+


EDIT 2:

This is a repro script with 1 million rows. It runs in about 1 minute on my machine. warning: it drops and recreates a new database

```
USE master;
GO
DROP DATABASE IF EXISTS MyDbWeirdTest;
GO
CREATE DATABASE MyDbWeirdTest;
GO
USE MyDbWeirdTest;
GO

CREATE TABLE IdsBin (
ID INT NOT NULL,
Hash BINARY(20) NOT NULL
);
CREATE CLUSTERED COLUMNSTORE INDEX ix1 ON IdsBin
GO

CREATE TYPE tBin AS TABLE (
ID INT,
Hash BINARY(20)
);
GO

CREATE OR ALTER PROCEDURE pBin (
@ids AS dbo.tBin READONLY
)
AS
BEGIN
SET NOCOUNT ON;

INSERT dbo.IdsBin
SELECT ID, Hash FROM @ids;
END;
GO

SET NOCOUNT ON;
DECLARE @i INT = 1, @t INT = 1;

Solution

sys.column_store_segments and sys.column_store_row_groups store some of the metadata information about the ColumnStore data, but I believe that ends up representing the compressed size. There are LOB structures that are allocated as well, and you can see the uncompressed size in the allocation unit / partition DMVs (and if you get to the pages somehow with DBCC PAGE, I bet you would see that they're relatively empty). In other words, sys.column_store_row_groups shows you how much data is stored on those pages, but doesn't add in the free space on those pages (which still take up space in the data file and in memory, like a fragmented index or an index with a really low fill factor).

I ran your repro and here is what I saw:

SELECT 
  a.[type_desc], 
  p.[rows],
  a.total_pages, reserved_kb = a.total_pages * 8, 
  a.used_pages,  data_kb     = a.used_pages  * 8
FROM sys.allocation_units AS a
INNER JOIN sys.partitions AS p 
   ON a.container_id = p.[partition_id]
WHERE p.[object_id] = OBJECT_ID(N'dbo.IdsBin');


Results:

Other than unused, which I didn't bother calculating, the results match up with sp_spaceused:

And you can confirm these numbers in sys.dm_db_partition_stats as well:

SELECT 
  lob_reserved_page_count, reserved_kb = lob_reserved_page_count * 8,
  lob_used_page_count,     data_kb     = lob_used_page_count     * 8
FROM sys.dm_db_partition_stats 
WHERE [object_id] = OBJECT_ID(N'dbo.IdsBin');


Results:

You mentioned this but I just wanted to call out explicitly that sp_spaceused and allocation_units reflect the page count, regardless of how full or empty any page might be. The row groups DMV is reflecting just the actual data. The documentation states (emphasis mine):

Size in bytes of all the data in this row group (not including metadata or shared dictionaries)

Whereas sys.dm_db_partition_stats, for example, explicitly states pages throughout, not data, although I would argue that they should specify that each LOB is an 8K page here:

Total number of LOBs used to store and manage columnstore index in the partition.

Which number you want to trust, well, that's up to you.

As an aside, Niko Neugebauer talked about the fact that only the compressed size of the dictionaries is exposed in the columnstore DMVs here, and raised a feedback item about it. Seems to me there is other information that could be exposed in the columnstore DMVs too.

Code Snippets

SELECT 
  a.[type_desc], 
  p.[rows],
  a.total_pages, reserved_kb = a.total_pages * 8, 
  a.used_pages,  data_kb     = a.used_pages  * 8
FROM sys.allocation_units AS a
INNER JOIN sys.partitions AS p 
   ON a.container_id = p.[partition_id]
WHERE p.[object_id] = OBJECT_ID(N'dbo.IdsBin');
SELECT 
  lob_reserved_page_count, reserved_kb = lob_reserved_page_count * 8,
  lob_used_page_count,     data_kb     = lob_used_page_count     * 8
FROM sys.dm_db_partition_stats 
WHERE [object_id] = OBJECT_ID(N'dbo.IdsBin');

Context

StackExchange Database Administrators Q#202450, answer score: 3

Revisions (0)

No revisions yet.