patternMinor
Clustered columnstore index on small tables
Viewed 0 times
clusteredtablescolumnstoresmallindex
Problem
Clustered column store indexed tables in general are useful for large tables. Ideally with million of rows.
And also useful with queries, which selects only the subset of available columns in such tables.
What happens if we break these two "rules"/best practices?
My tests don't reveal any performance degradation comparing to row stored clustered index table.
Which is great in our case.
Is there any "long term" effects breaking these two rules? Or any hidden pitfalls which haven't showed up just yet?
Context why is it needed: I designed a database model which will be used for many instances of different vendor databases. The schema remains the same in every database, but different vendors have different amount of data. Hence few small vendors may end up with small amount of data (<1 000 000) in their tables. I can't allow myself to keep up two different database for row-store and column-store model.
And also useful with queries, which selects only the subset of available columns in such tables.
What happens if we break these two "rules"/best practices?
- Like having a clustered column store indexed table which will only store few thousand, or hundreds of thousands of rows max.
- And running queries against those clustered column store table where all the columns are needed.
My tests don't reveal any performance degradation comparing to row stored clustered index table.
Which is great in our case.
Is there any "long term" effects breaking these two rules? Or any hidden pitfalls which haven't showed up just yet?
Context why is it needed: I designed a database model which will be used for many instances of different vendor databases. The schema remains the same in every database, but different vendors have different amount of data. Hence few small vendors may end up with small amount of data (<1 000 000) in their tables. I can't allow myself to keep up two different database for row-store and column-store model.
Solution
Columnstore index has a big advantage on compress data size. The general aim of the Columnstore index is quickly read a bunch of data due to its compression.
CCI is Columnstore Clustered Index, Clustered is Clustered Index
Columnstore Index compresses data size from 4MB to 2MB.
We can look at the performance in two tables and three parts.
The first one is the minimal
Result: There is a
The second one is the minimal
Result: There are read, CPU, and time differences as we can see.
The third one is the
If I rebuild and update all rows for index fragmentation, I will see a more fragmented Clustered Columnstore Index than the Clustered Index. And I didn't show, but the
Like the document says
If you have a small table you don't need to Columnstore index.
CCI is Columnstore Clustered Index, Clustered is Clustered Index
Columnstore Index compresses data size from 4MB to 2MB.
We can look at the performance in two tables and three parts.
The first one is the minimal
SELECT operation:SELECT * FROM Users_CCI WITH(INDEX=CCI_Users) WHERE Id=12333SELECT * FROM Users_Clustered WHERE Id=12333Result: There is a
Columnstore Scan here and wrong estimated. Further, logical read difference. You can say it doesn't matter for you, but, If you use a minimal SELECT query, probably use thousands of times. And it will affect total performance.The second one is the minimal
UPDATE operation:UPDATE Users_CCI SET Age=10 WHERE Id=2UPDATE Users_Clustered SET Age=10 WHERE Id=2Result: There are read, CPU, and time differences as we can see.
The third one is the
REBUILD operation:USE [StackOverflow2013]
GO
ALTER INDEX [CCI_Users] ON [dbo].[Users_CCI] REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = COLUMNSTORE)
GO
USE [StackOverflow2013]
GO
ALTER INDEX [PK_Users_Clustered_Id] ON [dbo].[Users_Clustered] REBUILD PARTITION = ALL WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
UPDATE Users_CCI SET Age=10
UPDATE Users_Clustered SET Age=10If I rebuild and update all rows for index fragmentation, I will see a more fragmented Clustered Columnstore Index than the Clustered Index. And I didn't show, but the
Clustered Columnstore Index Rebuild process produces transaction log more than Clustered Index.Like the document says
- More than 10% of the operations on the table are updates and deletes. Large numbers of updates and deletes cause fragmentation.The fragmentation affects compression rates and query performance until you run an operation called reorganize that forces all data into the columnstore and removes fragmentation. For more information, see Minimizing index fragmentation in columnstore index.
If you have a small table you don't need to Columnstore index.
Code Snippets
SELECT * FROM Users_CCI WITH(INDEX=CCI_Users) WHERE Id=12333SELECT * FROM Users_Clustered WHERE Id=12333UPDATE Users_CCI SET Age=10 WHERE Id=2UPDATE Users_Clustered SET Age=10 WHERE Id=2USE [StackOverflow2013]
GO
ALTER INDEX [CCI_Users] ON [dbo].[Users_CCI] REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = COLUMNSTORE)
GO
USE [StackOverflow2013]
GO
ALTER INDEX [PK_Users_Clustered_Id] ON [dbo].[Users_Clustered] REBUILD PARTITION = ALL WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
UPDATE Users_CCI SET Age=10
UPDATE Users_Clustered SET Age=10Context
StackExchange Database Administrators Q#280157, answer score: 4
Revisions (0)
No revisions yet.