HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Why would a table with a Clustered Columnstore Index have many open rowgroups?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
clusteredwhywithhavecolumnstoreopenwouldrowgroupsmanyindex

Problem

I was experiencing some performance issues with a query yesterday and upon further investigation, I noticed what I believe is odd behavior with a clustered columnstore index that I'm trying to get to the bottom of.

The table is

CREATE TABLE [dbo].[NetworkVisits](
    [SiteId] [int] NOT NULL,
    [AccountId] [int] NOT NULL,
    [CreationDate] [date] NOT NULL,
    [UserHistoryId] [int] NOT NULL
)


with the index:

CREATE CLUSTERED COLUMNSTORE INDEX [CCI_NetworkVisits] 
   ON [dbo].[NetworkVisits] WITH (DROP_EXISTING = OFF, COMPRESSION_DELAY = 0) ON [PRIMARY]


The table currently has 1.3 Billion rows in it and we are constantly inserting new rows to it. When I say constantly, I mean all the time. It's a steady stream of inserting one row at a time to the table.

Insert Into NetworkVisits (SiteId, AccountId, CreationDate, UserHistoryId)
Values (@SiteId, @AccountId, @CreationDate, @UserHistoryId)


Execution plan here

I also have a scheduled job that runs every 4 hours to delete duplicate rows from the table. The query is:

With NetworkVisitsRows
  As (Select SiteId, UserHistoryId, Row_Number() Over (Partition By SiteId, UserHistoryId
                                    Order By CreationDate Asc) RowNum
        From NetworkVisits
       Where CreationDate > GETUTCDATE() - 30)
DELETE
FROM NetworkVisitsRows
WHERE RowNum > 1
Option (MaxDop 48)


The execution plan has been pasted here.

While digging into the issue, I noticed that the NetworkVisits table had roughly 2000 rowgroups in it, with about 800 of them being in an open state and no where near the max allowed (1048576). Here is a small sample of what I was seeing:

I ran a reorganize on the index, which compressed all but 1 rowgroup, but this morning I checked again and we again have multiple open rowgroups - the one that was created yesterday after the reorganize, then 3 others each created roughly around the time the deletion job ran:

```
TableName IndexName ty

Solution

With constant trickle inserts, you very well may end up with numerous open deltastore rowgroups. The reason for this is that when an insert starts, a new rowgroup is created if all of the existing ones are locked. From Stairway to Columnstore Indexes Level 5: Adding New Data To Columnstore Indexes


Any insert of 102,399 or fewer rows is considered a "trickle insert". These rows are added to an open deltastore if one is available (and not locked), or else a new deltastore rowgroup is created for them.

In general, the columnstore index design is optimized for bulk inserts, and when using trickle inserts you'll need to run the reorg on a periodic basis.

Another option, recommended in the Microsoft documentation, is to trickle into a staging table (heap), and when it gets over 102,400 rows, insert those rows into the columstore index. See Columnstore indexes - Data loading guidance

In any case, after deleting a lot of data, a reorg is recommended on a columnstore index so that the data will actually be deleted, and the resulting deltastore rowgroups will get cleaned up.

Context

StackExchange Database Administrators Q#255985, answer score: 13

Revisions (0)

No revisions yet.