HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Clustered columnstore index on small tables

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
clusteredtablescolumnstoresmallindex

Problem

Clustered column store indexed tables in general are useful for large tables. Ideally with million of rows.
And also useful with queries, which selects only the subset of available columns in such tables.

What happens if we break these two "rules"/best practices?

  • Like having a clustered column store indexed table which will only store few thousand, or hundreds of thousands of rows max.



  • And running queries against those clustered column store table where all the columns are needed.



My tests don't reveal any performance degradation comparing to row stored clustered index table.
Which is great in our case.

Is there any "long term" effects breaking these two rules? Or any hidden pitfalls which haven't showed up just yet?

Context why is it needed: I designed a database model which will be used for many instances of different vendor databases. The schema remains the same in every database, but different vendors have different amount of data. Hence few small vendors may end up with small amount of data (<1 000 000) in their tables. I can't allow myself to keep up two different database for row-store and column-store model.

Solution

Columnstore index has a big advantage on compress data size. The general aim of the Columnstore index is quickly read a bunch of data due to its compression.

CCI is Columnstore Clustered Index, Clustered is Clustered Index

Columnstore Index compresses data size from 4MB to 2MB.

We can look at the performance in two tables and three parts.
The first one is the minimal SELECT operation:

SELECT * FROM Users_CCI WITH(INDEX=CCI_Users) WHERE Id=12333


SELECT * FROM Users_Clustered WHERE Id=12333


Result: There is a Columnstore Scan here and wrong estimated. Further, logical read difference. You can say it doesn't matter for you, but, If you use a minimal SELECT query, probably use thousands of times. And it will affect total performance.
The second one is the minimal UPDATE operation:

UPDATE Users_CCI SET Age=10 WHERE  Id=2


UPDATE Users_Clustered SET Age=10 WHERE  Id=2


Result: There are read, CPU, and time differences as we can see.
The third one is the REBUILD operation:

USE [StackOverflow2013]
GO
ALTER INDEX [CCI_Users] ON [dbo].[Users_CCI] REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = COLUMNSTORE)
GO
USE [StackOverflow2013]
GO
ALTER INDEX [PK_Users_Clustered_Id] ON [dbo].[Users_Clustered] REBUILD PARTITION = ALL WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO

UPDATE Users_CCI SET Age=10 
UPDATE Users_Clustered SET Age=10


If I rebuild and update all rows for index fragmentation, I will see a more fragmented Clustered Columnstore Index than the Clustered Index. And I didn't show, but the Clustered Columnstore Index Rebuild process produces transaction log more than Clustered Index.

Like the document says

  • More than 10% of the operations on the table are updates and deletes. Large numbers of updates and deletes cause fragmentation.The fragmentation affects compression rates and query performance until you run an operation called reorganize that forces all data into the columnstore and removes fragmentation. For more information, see Minimizing index fragmentation in columnstore index.



If you have a small table you don't need to Columnstore index.

Code Snippets

SELECT * FROM Users_CCI WITH(INDEX=CCI_Users) WHERE Id=12333
SELECT * FROM Users_Clustered WHERE Id=12333
UPDATE Users_CCI SET Age=10 WHERE  Id=2
UPDATE Users_Clustered SET Age=10 WHERE  Id=2
USE [StackOverflow2013]
GO
ALTER INDEX [CCI_Users] ON [dbo].[Users_CCI] REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = COLUMNSTORE)
GO
USE [StackOverflow2013]
GO
ALTER INDEX [PK_Users_Clustered_Id] ON [dbo].[Users_Clustered] REBUILD PARTITION = ALL WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO

UPDATE Users_CCI SET Age=10 
UPDATE Users_Clustered SET Age=10

Context

StackExchange Database Administrators Q#280157, answer score: 4

Revisions (0)

No revisions yet.