patternsqlModerate
Code creating clustered columnstore index while maintaining row order
Viewed 0 times
clusteredorderwhilecreatingcolumnstorecodeindexrowmaintaining
Problem
I want to convert a rowstore table to a columnstore table by creating a clustered columnstore index. There are three columns in the table: id,time, and value.
The table is ordered by id and time before creating columnstore index; however, after creating columnstore index, the row order is messed up. I thought it might be due to the parallelism and added the
Here is the code creating tables and indexs:
The screenshot of the query results:
The screenshot of the query results with columnstore index:
My understanding is that the order of rows is determined by the compression algorithm and there is nothing we can do about it, see the limitation and restriction in the document here with the following quote:
Cannot include the ASC or DESC keywords for sorting the index.
Columnstore indexes are ordered according to the compression
algorithms. Sorting would eliminate many of the performance benefits.
I'm using SQL Server 2016 Developer Edition on Windows 10 64bit.
The table is ordered by id and time before creating columnstore index; however, after creating columnstore index, the row order is messed up. I thought it might be due to the parallelism and added the
maxdop = 1 option, but that didn't fix the problem. Can anyone help me with this?Here is the code creating tables and indexs:
-- creating rowstore table
drop table if exists tab1_rstore
select id, time, value
into tab1_rstore
from tab0
order by id_loan, period
option(maxdop 1)
-- creating clustered index on rowstore table
create clustered index idx on tab1_rstore (id,time)
-- creating columnstore table
select *
into tab1_cstore
from tab1_rstore
option(maxdop 1)
-- comparing the first two rows from these two tables
select top 2 *
from tab1_rstore
select top 2 *
from tab1_cstoreThe screenshot of the query results:
-- creating clustered columnstore index
create clustered columnstore index idx on tab1_cstore
with (maxdop = 1)
-- comparing the top two rows again
select top 2 *
from tab1_rstore
select top 2 *
from tab1_cstoreThe screenshot of the query results with columnstore index:
My understanding is that the order of rows is determined by the compression algorithm and there is nothing we can do about it, see the limitation and restriction in the document here with the following quote:
Cannot include the ASC or DESC keywords for sorting the index.
Columnstore indexes are ordered according to the compression
algorithms. Sorting would eliminate many of the performance benefits.
I'm using SQL Server 2016 Developer Edition on Windows 10 64bit.
Solution
A clustered columnstore index is fundamentally different from a clustered rowstore index. You may have noticed there is no key column specification for a clustered columnstore index. That's right: a clustered columnstore index is an index with no keys - all columns are 'included'.
The most intuitive description I have heard for a clustered columnstore index is to think of it as a column-oriented heap table (where the 'RID' is
If you need indexes to support direct ordering and/or point/small range selections, you can create updateable rowstore b-tree indexes on top of clustered columnstore in SQL Server 2016.
In many cases this is simply not necessary, since columnstore access and batch mode sorting is so fast. Many of the things people 'know' about rowstore performance need to be relearned for columnstore. Scans and hashes are good :)
That said, of course columnstore has a structure to its row groups (and metadata about min/max values in each segment), which can be useful in queries that can benefit from row group/segment elimination.
One important technique in this area is to first create a clustered rowstore index with the desired ordering, then create the clustered columnstore index using the
Care is needed to maintain the benefits of row group/segment elimination over time. Also, while columnstore is already implicitly partitioned by row group, but you can explicitly partition it as well.
I'm not 100% sure what you're looking to test, but it is true that the 'order' of values within a segment is determined by the compression algorithm. My point about creating the columnstore index with
The most intuitive description I have heard for a clustered columnstore index is to think of it as a column-oriented heap table (where the 'RID' is
rowgroup_id, row_number).If you need indexes to support direct ordering and/or point/small range selections, you can create updateable rowstore b-tree indexes on top of clustered columnstore in SQL Server 2016.
In many cases this is simply not necessary, since columnstore access and batch mode sorting is so fast. Many of the things people 'know' about rowstore performance need to be relearned for columnstore. Scans and hashes are good :)
That said, of course columnstore has a structure to its row groups (and metadata about min/max values in each segment), which can be useful in queries that can benefit from row group/segment elimination.
One important technique in this area is to first create a clustered rowstore index with the desired ordering, then create the clustered columnstore index using the
WITH (DROP_EXISTING = ON, MAXDOP = 1) option. In your example:CREATE [UNIQUE] CLUSTERED INDEX idx
ON dbo.tab1_cstore (id, time)
WITH (MAXDOP = 1);
CREATE CLUSTERED COLUMNSTORE INDEX idx
ON dbo.tab1_cstore
WITH (DROP_EXISTING = ON, MAXDOP = 1);Care is needed to maintain the benefits of row group/segment elimination over time. Also, while columnstore is already implicitly partitioned by row group, but you can explicitly partition it as well.
I'm not 100% sure what you're looking to test, but it is true that the 'order' of values within a segment is determined by the compression algorithm. My point about creating the columnstore index with
DROP_EXISTING is about the ordering of data flowing into the segment creation process, so that segments overall will be ordered in a particular way. Within the segment, all bets are off.Code Snippets
CREATE [UNIQUE] CLUSTERED INDEX idx
ON dbo.tab1_cstore (id, time)
WITH (MAXDOP = 1);
CREATE CLUSTERED COLUMNSTORE INDEX idx
ON dbo.tab1_cstore
WITH (DROP_EXISTING = ON, MAXDOP = 1);Context
StackExchange Database Administrators Q#158651, answer score: 15
Revisions (0)
No revisions yet.