patternsqlMinor
Fragmented clustered primary key (sequential GUID) index after processing - SQL Server
Viewed 0 times
guidclusteredafterprimarysqlfragmentedsequentialserverindexprocessing
Problem
I'm in the process of evaluating a schema change on the primary keys of several of our biggest tables in our software.
Till now we used varchar(36) for all of our PKs and FKs and Hibernate generated the random GUIDs in insert operations.
Moreover SQL server just makes all those columns clustered indexes also, so I'm worried now (after researching about) for the index fragmentation that all those inserts (and we have a lot of inserts and deletes and updates) will cause to the indexes.
A query made on the tables showed that almost all indexes are heavily fragmented.
So I decided to change the column type for several of our biggest tables (between 3 million and 1 million, so not that big) from varchar(36) to unique identifier with the DEFAULT option newsequentialid().
I did a test yesterday on a much smaller db: after changing those PKs and after a big processing (about 20k inserts and I think the same amount of deletes) it shows an index fragmentation for the clustered keys (id) from 30% to 40%.
How is this possible?
Of course I rebuilt the indexes before the processing.
These are the fragmentation data of ax example table (46528 rows) -> 42% fragmentation:
Now I'm worried that this schema change won't help that much.
EDIT: I think it's important to note that the fill factor on the tables is set to 80 and that after the insert it's very possible that the row gets also u
Till now we used varchar(36) for all of our PKs and FKs and Hibernate generated the random GUIDs in insert operations.
Moreover SQL server just makes all those columns clustered indexes also, so I'm worried now (after researching about) for the index fragmentation that all those inserts (and we have a lot of inserts and deletes and updates) will cause to the indexes.
A query made on the tables showed that almost all indexes are heavily fragmented.
So I decided to change the column type for several of our biggest tables (between 3 million and 1 million, so not that big) from varchar(36) to unique identifier with the DEFAULT option newsequentialid().
I did a test yesterday on a much smaller db: after changing those PKs and after a big processing (about 20k inserts and I think the same amount of deletes) it shows an index fragmentation for the clustered keys (id) from 30% to 40%.
How is this possible?
Of course I rebuilt the indexes before the processing.
These are the fragmentation data of ax example table (46528 rows) -> 42% fragmentation:
Schema Table Index avg_fragmentation_in_percent page_count fragment_count
dbo e2_per_out_presenze PK_e2_per_out_presenze_id 42,91 3714 1704
dbo e2_per_out_presenze FK_e2_per_out_presenze_2 1,78 448 36
dbo e2_per_out_presenze FK_e2_per_out_presenze_3 0 194 11
dbo e2_per_out_presenze FK_e2_per_out_presenze_4 0 444 12
dbo e2_per_out_presenze FK_e2_per_out_presenze_6 0 161 16
dbo e2_per_out_presenze FK_e2_per_out_presenze_7 0 166 14
dbo e2_per_out_presenze Index_8 0 466 28Now I'm worried that this schema change won't help that much.
EDIT: I think it's important to note that the fill factor on the tables is set to 80 and that after the insert it's very possible that the row gets also u
Solution
Now I'm worried that this schema change won't help that much.
Changing the column from a
It appears from a cursory glance at your sample table above that the average row size is somewhere around 600 bytes. Converting the
As a bonus, once you convert the column from a
Having said that, I decided to try to recreate the problems you're seeing with index fragmentation. I created a test table with the
The fragmentation query results:
```
/ rebuild the index to remove fragmentation /
ALTER INDEX PK_guid_insert ON dbo.guid_insert REBUILD;
/* drop the primary key, since w
Changing the column from a
varchar(36) to a uniqueidentifier will greatly reduce storage requirements from 36 bytes per row to 16 bytes per row; for each 100,000 row table, that equates to a savings of 2MB. 2MB may not sound like much by itself, but since the column is the primary key, that 2MB applies to each non-clustered index on the table. If any other tables are using foreign keys to ensure relational integrity, the space savings applies to those tables as well. Realize also that this space savings doesn't only apply to on-disk storage, it also applies equally, and one might argue, more importantly, to rows contained in the buffer-pool. It appears from a cursory glance at your sample table above that the average row size is somewhere around 600 bytes. Converting the
varchar(36) to a uniqueidentifier will immediately offer a 3.3% space savings for the clustered index, and more than that for each non-clustered index.As a bonus, once you convert the column from a
varchar(36) to a uniqueidentifier, it will no longer be possible to insert this is a bad key into the column, as is currently possible. This will result in much better data integrity, and will potentially save you a lot of work down the road.Having said that, I decided to try to recreate the problems you're seeing with index fragmentation. I created a test table with the
varchar(36) primary key column with 100,000 rows. I then converted the column to a uniqueidentifier and was unable to see a drastic increase in index fragmentation. My test bed code is:USE tempdb;
/* create our test table, with a VARCHAR primary key */
IF OBJECT_ID('dbo.guid_insert') IS NOT NULL
DROP TABLE dbo.guid_insert;
CREATE TABLE dbo.guid_insert
(
PK VARCHAR(36) NOT NULL
CONSTRAINT PK_guid_insert
PRIMARY KEY CLUSTERED
WITH (
DATA_COMPRESSION = NONE
, PAD_INDEX = OFF
, ALLOW_ROW_LOCKS = ON
, ALLOW_PAGE_LOCKS = ON
, FILLFACTOR = 80
)
, SomeData VARCHAR(600)
, CreatedDate DATETIME
CONSTRAINT DF_quid_insert_CreatedDate
DEFAULT (GETDATE())
) ON [PRIMARY];
/* insert 100,000 rows into the test table */
INSERT INTO dbo.guid_insert (PK, SomeData)
SELECT NEWID(), REPLICATE('.', 600)
FROM (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) v(num)
CROSS APPLY (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) v1(num)
CROSS APPLY (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) v2(num)
CROSS APPLY (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) v3(num)
CROSS APPLY (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) v4(num);
/* show the fragmentation details */
SELECT ObjectName = QUOTENAME(s.name) + '.' + QUOTENAME(o.name)
, IndexName = QUOTENAME(i.name)
, [FillFactor] = 'FILLFACTOR = ' + CONVERT(VARCHAR(50), CASE WHEN i.fill_factor = 0 THEN '100' ELSE i.fill_factor END)
, CompressOption = 'DATA_COMPRESSION = ' + p.data_compression_desc
, NumberOfFragments = ips.fragment_count
, NumberOfPages = ips.page_count
, AvgFragmentSizeInPages = ips.avg_fragment_size_in_pages
, AvgFragmentationInPercent = ips.avg_fragmentation_in_percent
, IndexType = i.type_desc
, IsUnique = i.is_unique
, IsPrimary = i.is_primary_key
, IsPartitioned = CASE WHEN ps.data_space_id IS NULL THEN 0 ELSE 1 END
, IsClustered = CASE WHEN i.type = 1 THEN 1 ELSE 0 END
FROM sys.indexes i WITH (NOLOCK)
INNER JOIN sys.objects o WITH (NOLOCK) ON i.object_id = o.object_id
INNER JOIN sys.schemas s WITH (NOLOCK) ON o.schema_id = s.schema_id
INNER JOIN sys.data_spaces ds WITH (NOLOCK) ON i.data_space_id = ds.data_space_id
INNER JOIN sys.partitions p WITH (NOLOCK) ON o.object_id = p.object_id
AND i.index_id = p.index_id
LEFT JOIN sys.partition_schemes ps WITH (NOLOCK) ON ds.data_space_id = ps.data_space_id
CROSS APPLY sys.dm_db_index_physical_stats(DB_ID(), o.object_id, i.index_id, p.partition_number, 'DETAILED') ips
WHERE o.is_ms_shipped = 0
AND NOT (
o.type = 'TF' -- table valued function
OR o.type = 'TT' -- table type
OR o.type = 'SO' -- sequence object
)
AND i.index_id > 0
AND i.is_disabled = 0
AND i.is_hypothetical = 0
AND (
ds.type = 'FG' -- filegroup
OR ds.type = 'PS' -- partition stream
)
AND i.type_desc IN
( --we only support rebuilding/reorganizing these index types:
'CLUSTERED'
, 'NONCLUSTERED'
, 'XML'
)
AND ips.index_level = 0 -- leaf-levels only
AND ips.fragment_count > 1
ORDER BY ips.fragment_count * ips.avg_fragmentation_in_percent DESC;The fragmentation query results:
```
/ rebuild the index to remove fragmentation /
ALTER INDEX PK_guid_insert ON dbo.guid_insert REBUILD;
/* drop the primary key, since w
Code Snippets
USE tempdb;
/* create our test table, with a VARCHAR primary key */
IF OBJECT_ID('dbo.guid_insert') IS NOT NULL
DROP TABLE dbo.guid_insert;
CREATE TABLE dbo.guid_insert
(
PK VARCHAR(36) NOT NULL
CONSTRAINT PK_guid_insert
PRIMARY KEY CLUSTERED
WITH (
DATA_COMPRESSION = NONE
, PAD_INDEX = OFF
, ALLOW_ROW_LOCKS = ON
, ALLOW_PAGE_LOCKS = ON
, FILLFACTOR = 80
)
, SomeData VARCHAR(600)
, CreatedDate DATETIME
CONSTRAINT DF_quid_insert_CreatedDate
DEFAULT (GETDATE())
) ON [PRIMARY];
/* insert 100,000 rows into the test table */
INSERT INTO dbo.guid_insert (PK, SomeData)
SELECT NEWID(), REPLICATE('.', 600)
FROM (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) v(num)
CROSS APPLY (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) v1(num)
CROSS APPLY (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) v2(num)
CROSS APPLY (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) v3(num)
CROSS APPLY (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) v4(num);
/* show the fragmentation details */
SELECT ObjectName = QUOTENAME(s.name) + '.' + QUOTENAME(o.name)
, IndexName = QUOTENAME(i.name)
, [FillFactor] = 'FILLFACTOR = ' + CONVERT(VARCHAR(50), CASE WHEN i.fill_factor = 0 THEN '100' ELSE i.fill_factor END)
, CompressOption = 'DATA_COMPRESSION = ' + p.data_compression_desc
, NumberOfFragments = ips.fragment_count
, NumberOfPages = ips.page_count
, AvgFragmentSizeInPages = ips.avg_fragment_size_in_pages
, AvgFragmentationInPercent = ips.avg_fragmentation_in_percent
, IndexType = i.type_desc
, IsUnique = i.is_unique
, IsPrimary = i.is_primary_key
, IsPartitioned = CASE WHEN ps.data_space_id IS NULL THEN 0 ELSE 1 END
, IsClustered = CASE WHEN i.type = 1 THEN 1 ELSE 0 END
FROM sys.indexes i WITH (NOLOCK)
INNER JOIN sys.objects o WITH (NOLOCK) ON i.object_id = o.object_id
INNER JOIN sys.schemas s WITH (NOLOCK) ON o.schema_id = s.schema_id
INNER JOIN sys.data_spaces ds WITH (NOLOCK) ON i.data_space_id = ds.data_space_id
INNER JOIN sys.partitions p WITH (NOLOCK) ON o.object_id = p.object_id
AND i.index_id = p.index_id
LEFT JOIN sys.partition_schemes ps WITH (NOLOCK) ON ds.data_space_id = ps.data_space_id
CROSS APPLY sys.dm_db_index_physical_stats(DB_ID(), o.object_id, i.index_id, p.partition_number, 'DETAILED') ips
WHERE o.is_ms_shipped = 0
AND NOT (
o.type = 'TF' -- table valued function
OR o.type = 'TT' -- table type
OR o.type = 'SO' -- sequence object
)
AND i.index_id > 0
AND i.is_disabled = 0
AND i.is_hypothetical = 0
AND (
ds.type = 'FG' -- filegroup
OR ds.type = 'PS' -- partition stream
)
AND i.type_desc IN
( --we only support rebuilding/reorganizing these index types:
'CLUSTERED'
, 'NONCLUSTERED/* rebuild the index to remove fragmentation */
ALTER INDEX PK_guid_insert ON dbo.guid_insert REBUILD;
/* drop the primary key, since we can't modify the PK column when it is a primary key */
ALTER TABLE dbo.guid_insert
DROP CONSTRAINT PK_guid_insert;
/* convert the VARCHAR PK column into a UNIQUEIDENTIFIER column */
ALTER TABLE dbo.guid_insert
ALTER COLUMN PK UNIQUEIDENTIFIER NOT NULL;
/* recreate the primary key */
ALTER TABLE dbo.guid_insert
ADD CONSTRAINT PK_guid_insert
PRIMARY KEY CLUSTERED (PK);
/* add a DEFAULT constraint to create sequential keys */
ALTER TABLE dbo.guid_insert
ADD CONSTRAINT DF_guid_insert_PK
DEFAULT (NEWSEQUENTIALID())
FOR PK;
/* delete 10,000 random rows */
;WITH src AS
(
SELECT rn = ROW_NUMBER() OVER (ORDER BY NEWID()), *
FROM dbo.guid_insert
)
DELETE
FROM src
WHERE rn < 10000;
/* insert 100,000 more rows */
INSERT INTO dbo.guid_insert (SomeData)
SELECT REPLICATE('.', 600)
FROM (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) v(num)
CROSS APPLY (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) v1(num)
CROSS APPLY (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) v2(num)
CROSS APPLY (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) v3(num)
CROSS APPLY (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) v4(num);
/* show the fragmentation details */
SELECT ObjectName = QUOTENAME(s.name) + '.' + QUOTENAME(o.name)
, IndexName = QUOTENAME(i.name)
, [FillFactor] = 'FILLFACTOR = ' + CONVERT(VARCHAR(50), CASE WHEN i.fill_factor = 0 THEN '100' ELSE i.fill_factor END)
, CompressOption = 'DATA_COMPRESSION = ' + p.data_compression_desc
, NumberOfFragments = ips.fragment_count
, NumberOfPages = ips.page_count
, AvgFragmentSizeInPages = ips.avg_fragment_size_in_pages
, AvgFragmentationInPercent = ips.avg_fragmentation_in_percent
, IndexType = i.type_desc
, IsUnique = i.is_unique
, IsPrimary = i.is_primary_key
, IsPartitioned = CASE WHEN ps.data_space_id IS NULL THEN 0 ELSE 1 END
, IsClustered = CASE WHEN i.type = 1 THEN 1 ELSE 0 END
FROM sys.indexes i WITH (NOLOCK)
INNER JOIN sys.objects o WITH (NOLOCK) ON i.object_id = o.object_id
INNER JOIN sys.schemas s WITH (NOLOCK) ON o.schema_id = s.schema_id
INNER JOIN sys.data_spaces ds WITH (NOLOCK) ON i.data_space_id = ds.data_space_id
INNER JOIN sys.partitions p WITH (NOLOCK) ON o.object_id = p.object_id
AND i.index_id = p.index_id
LEFT JOIN sys.partition_schemes ps WITH (NOLOCK) ON ds.data_space_id = ps.data_space_id
CROSS APPLY sys.dm_db_index_physical_stats(DB_ID(), o.object_id, i.index_id, p.partition_number, 'DETAILED') ips
WHERE o.is_ms_shipped = 0
AND NOT (
o.type = 'TF' -- table valued function
OR o.type = 'TT' -- table type
OR o.type = 'SO' -- sequence object
)
AND i.index_id > 0
AND i.is_disabled = 0
AND i.is_hypothetical = 0
AND (
ds.type = 'FG' -- filegroup
OR dContext
StackExchange Database Administrators Q#115432, answer score: 2
Revisions (0)
No revisions yet.