principlesqlMinor
Clustered Index fragmentation vs Index with Included columns fragmentation
Viewed 0 times
clusteredfragmentationcolumnswithincludedindex
Problem
I have a relative large table (for me) with 40 million rows that is expected to grow to 80 to 120 million rows in a period of two weeks (during a campaign).
So I'm hosting on SQL Azure and Azure already recommended adding an Index with included columns.
I was always hesitant on using UserId,TipIndex as a Clustered Index, as Tips will be added at random. Meaning that I'm afraid of huge fragmentation problems etc.
My questions:
I know in the end the answer is always "depends" or that I should measure it. But as I'm a solo developer and not with a lot of resources I am hoping on someone with more experience that has a gut feeling for this, so my first attempts have a higher chance of going in the right direction.
Tip
--------------
Id int (clustered index)
UserId int
TipIndex smallint
Value binary(8)
LastChanged datetime2(3)- every user has between 1 to 400 tips that will be inserted at random during this period
- UserId + TipIndex is unique
- I never query Tips directly on the Identity Key (Id)
- 99% of the times I query on UserId
- I need all columns
- I often query on 1 UserId (per page view), sometimes on a batches of 10.000s for statistics
- this is a high traffic site during this period and it should be able to handle 30.000 queries on UserId per minute
- Id is at the moment my Clustered Index, because I read that it leads to the least fragmentation.
So I'm hosting on SQL Azure and Azure already recommended adding an Index with included columns.
I was always hesitant on using UserId,TipIndex as a Clustered Index, as Tips will be added at random. Meaning that I'm afraid of huge fragmentation problems etc.
My questions:
- Doesn't an Index with included columns have the exact same problem?
- Is a table with included columns not just the same as a "shadow table" with the same fragmentation problems?
- Should I migrate to use UserId, TipIndex as a ClusteredIndex instead of Id?
- How to prevent fragmentation?
I know in the end the answer is always "depends" or that I should measure it. But as I'm a solo developer and not with a lot of resources I am hoping on someone with more experience that has a gut feeling for this, so my first attempts have a higher chance of going in the right direction.
Solution
Your questions:
Doesn't an Index with included columns have the exact same problem?
Yes.
Is a table with included columns not just the same as a "shadow table" with the same fragmentation problems?
Yes
Should I migrate to use UserId, TipIndex as a ClusteredIndex instead of Id?
I would, yes.
How to prevent fragmentation?
There are a couple of different types of fragmentation to consider. One is when you only have part of your pages being used because you’ve had page splits. If you have a lot of inserts, this will happen. Don’t stress too much. The other is when you have pages where the subsequent page is in a different extent. Again, I wouldn’t worry too much. If your data is mostly in the buffer cache, it doesn’t really matter if it moves across extents.
So... don’t worry about it too much. But don’t bother having a complete copy of the data in a way that you won’t actually be querying it.
Doesn't an Index with included columns have the exact same problem?
Yes.
Is a table with included columns not just the same as a "shadow table" with the same fragmentation problems?
Yes
Should I migrate to use UserId, TipIndex as a ClusteredIndex instead of Id?
I would, yes.
How to prevent fragmentation?
There are a couple of different types of fragmentation to consider. One is when you only have part of your pages being used because you’ve had page splits. If you have a lot of inserts, this will happen. Don’t stress too much. The other is when you have pages where the subsequent page is in a different extent. Again, I wouldn’t worry too much. If your data is mostly in the buffer cache, it doesn’t really matter if it moves across extents.
So... don’t worry about it too much. But don’t bother having a complete copy of the data in a way that you won’t actually be querying it.
Context
StackExchange Database Administrators Q#282281, answer score: 6
Revisions (0)
No revisions yet.