HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

How to control Segmentation min/max data_id on a non-clustered ColumnStore index

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
clusteredcontrolnonsegmentationcolumnstoreminmaxhowindexdata_id

Problem

Given a simple row-based table without a PK but with a row-based clustered index like so:

create clustered index [CX_PropertyValue] ON [dbo].[PropertyValue] ([PropertyId], [Value])


Then I wish to add a column store index that is segmented in the same order as the clustered index above:

create nonclustered columnstore index CS_IX_PropertyValue on dbo.PropertyValue( 
    PropertyId, Value
)
with (drop_existing = on, maxdop = 1); -- maxdop=1 to preserve the order by property


MaxDop hint to preserve order came from: here

Then the following query was used to report the min/max data_id for the PropertyId column and it the full range was reported on each of the 7 segments:

```
create view [Common].[ColumnStoreSegmentationView]
as
/*---------------------------------------------------------------------------------------------------------------------
Purpose: List ColumnStore table segment min/max of columns.

Source: https://joyfulcraftsmen.com/blog/cci-how-to-load-data-for-better-columnstore-segment-elimination/
https://dba.stackexchange.com/a/268329/9415

Modified By Description
---------- ---------- -----------------------------------------------------------------------------------------
2020.06.02 crokusek/inet Initial Version
---------------------------------------------------------------------------------------------------------------------*/
select --top 20000000000
s.Name as SchemaName,
t.Name as TableName,
i.Name as IndexName,
c.name as ColumnName,
c.column_id as ColumnId,
cs.segment_id as SegmentId,
cs.min_data_id as MinValue,
cs.max_data_id as MaxValue
from sys.schemas s
join sys.tables t
on t.schema_id = s.schema_id
join sys.partitions as p
on p.object_id = t.object_id
join sys.indexes as I
on i.object_id = p.object_id
and i.index_id = p.index_id
join sys.index_columns as ic
on ic.[object_id] = I.

Solution

This isn't directly supported for nonclustered columnstore indexes.

It does work for clustered columnstore.

Azure Synapse Analytics has language support for doing it in one step e.g.:

CREATE CLUSTERED COLUMNSTORE INDEX 
ON dbo.PropertyValue
ORDER (PropertyId, Value);


This syntax has not yet made it to the SQL Server box product, though it is available under an undocumented feature flag so perhaps it isn't far away. It still won't work on a nonclustered columnstore index though.

General Workaround

The best you can do is to create the nonclustered rowstore index with MAXDOP = 1, then replace it with a nonclustered columnstore index with MAXDOP = 1 and DROP_EXISTING = ON.

This isn't guaranteed to preserve the ordering as you want, but it is highly likely:

CREATE NONCLUSTERED INDEX CS_IX_PropertyValue
ON dbo.PropertyValue (PropertyId, Value)
WITH (MAXDOP = 1);

CREATE NONCLUSTERED COLUMNSTORE INDEX CS_IX_PropertyValue
ON dbo.PropertyValue (PropertyId, Value)
WITH (DROP_EXISTING = ON, MAXDOP = 1);


This will give you your best chance of achieving rowgroup elimination when filtering on PropertyId.

Special Case

When the desired ordering matches the rowstore clustered index (as appears to be the case in the question), there is no need to create a rowstore nonclustered index first. The documentation says:


Note, for nonclustered columnstore index (NCCI), if the base rowstore table has a clustered index, the rows are already ordered. In this case, the resultant nonclustered columnstore index will automatically be ordered.

So, in your case, it should be enough to run only:

CREATE NONCLUSTERED COLUMNSTORE INDEX CS_IX_PropertyValue
ON dbo.PropertyValue (PropertyId, Value)
WITH (MAXDOP = 1);


See this db<>fiddle demo.

Metadata

You can see the min and max values for each rowgroup and column using:

SELECT
    CSS.column_id,
    column_name = C.[name],
    rowgroup_id = CSS.segment_id,
    CSS.min_data_id,
    CSS.max_data_id,
    CSS.row_count
FROM sys.partitions AS P
JOIN sys.column_store_segments AS CSS
    ON CSS.hobt_id = P.hobt_id
JOIN sys.indexes AS I
    ON I.[object_id] = P.[object_id]
    AND I.index_id = P.index_id
JOIN sys.index_columns AS IC
    ON IC.[object_id] = I.[object_id]
    AND IC.index_id = I.index_id
    AND IC.index_column_id = CSS.column_id
JOIN sys.columns AS C
    ON C.[object_id] = P.[object_id]
    AND C.column_id = IC.column_id
WHERE
    P.[object_id] = OBJECT_ID(N'dbo.PropertyValue', N'U')
ORDER BY
    C.column_id,
    CSS.segment_id;

Code Snippets

CREATE CLUSTERED COLUMNSTORE INDEX <index_name>
ON dbo.PropertyValue
ORDER (PropertyId, Value);
CREATE NONCLUSTERED INDEX CS_IX_PropertyValue
ON dbo.PropertyValue (PropertyId, Value)
WITH (MAXDOP = 1);

CREATE NONCLUSTERED COLUMNSTORE INDEX CS_IX_PropertyValue
ON dbo.PropertyValue (PropertyId, Value)
WITH (DROP_EXISTING = ON, MAXDOP = 1);
CREATE NONCLUSTERED COLUMNSTORE INDEX CS_IX_PropertyValue
ON dbo.PropertyValue (PropertyId, Value)
WITH (MAXDOP = 1);
SELECT
    CSS.column_id,
    column_name = C.[name],
    rowgroup_id = CSS.segment_id,
    CSS.min_data_id,
    CSS.max_data_id,
    CSS.row_count
FROM sys.partitions AS P
JOIN sys.column_store_segments AS CSS
    ON CSS.hobt_id = P.hobt_id
JOIN sys.indexes AS I
    ON I.[object_id] = P.[object_id]
    AND I.index_id = P.index_id
JOIN sys.index_columns AS IC
    ON IC.[object_id] = I.[object_id]
    AND IC.index_id = I.index_id
    AND IC.index_column_id = CSS.column_id
JOIN sys.columns AS C
    ON C.[object_id] = P.[object_id]
    AND C.column_id = IC.column_id
WHERE
    P.[object_id] = OBJECT_ID(N'dbo.PropertyValue', N'U')
ORDER BY
    C.column_id,
    CSS.segment_id;

Context

StackExchange Database Administrators Q#268328, answer score: 7

Revisions (0)

No revisions yet.