patternsqlMinor
Howto Identify Clustered Index Candidates
Viewed 0 times
clusteredhowtoidentifyindexcandidates
Problem
I find myself in a situation where I have inherited the db and people complain about database performance. I identified about a dozend of tables not having a clustered index or a not carefully planned clustered index, some of them with lots of data and frequently used. I am actually struggeling to figure out the best candidates for implementation of a CI for the heaps. After a lot of reading I am still not sure what to do. I know it depends. So I would like to give one table as a sample and hope to get some suggestions.
The table has 122 Columns. Yes, I know...
It holds about 800 000 rows.
Structure (excerpt):
On this table exist 29 nonclustered indexes. I guess some of them have been created just in dispair...
I know that the design of this and other tables is not very well. It is way to wide and should be splitted. This is on my list.
The queries almost always filter on th column [OrID]. This column is not unique (highest no of rows with one single OrID is > 310 000).
All q
The table has 122 Columns. Yes, I know...
It holds about 800 000 rows.
Structure (excerpt):
CREATE TABLE [dbo].[tO](
[ID] [uniqueidentifier] NOT NULL,
[MaID] [uniqueidentifier] NULL,
[OrID] [uniqueidentifier] NOT NULL,
[UsID] [uniqueidentifier] NOT NULL,
[CrUsID] [uniqueidentifier] NOT NULL,
[SuID] [uniqueidentifier] NULL,
[SOID] [uniqueidentifier] NULL,
[DeA1ID] [uniqueidentifier] NULL,
[DeA2ID] [uniqueidentifier] NULL,
[PayID] [uniqueidentifier] NULL,
[MAdID] [uniqueidentifier] NULL,
[ShID] [uniqueidentifier] NULL,
[SessID] [varchar](32) NULL,
[OrUID] [uniqueidentifier] NULL,
[PurOfUID] [uniqueidentifier] NULL,
[AddiUID] [uniqueidentifier] NULL,
[SupUID] [uniqueidentifier] NULL,
[PayTID] [uniqueidentifier] NULL,
[OPSID] [uniqueidentifier] NULL,
[StoRSID] [uniqueidentifier] NULL,
[PayMeID] [uniqueidentifier] NULL,
...
[OrderDate] [datetime] NULL,
[CreateDate] [datetime] NOT NULL,
[UpdateDate] [datetime] NULL,
[RowVersion] [timestamp] NOT NULL,
[Deleted] bit NOT NULL,
...
[CTR] int itentity(1,1) NOT NULL
CONSTRAINT [PK_Order] PRIMARY KEY NONCLUSTERED <= uses column [ID]On this table exist 29 nonclustered indexes. I guess some of them have been created just in dispair...
I know that the design of this and other tables is not very well. It is way to wide and should be splitted. This is on my list.
The queries almost always filter on th column [OrID]. This column is not unique (highest no of rows with one single OrID is > 310 000).
All q
Solution
Original Post
Honestly, in your situation I'd just go ahead and cluster the [ID]. It is worrying that there appears to be no default on it, but at the same time, maybe if you're lucky, it's being populated with some form of
Doing so will allow you to start focusing on reworking some filtered / include indexes and when you're done with that, maybe you'll have 4 sore spots instead of 12.
Any advice beyond that will probably result in "tip of the iceberg" classification for your problem - as you seem to be well aware, the question is really about abysmal design, rather than engine performance.
Update
I couldn't figure out a way to answer your comment clearly, so here's way too much information instead.
As it has been stated, if you have a thin, accessible, incremental columns, like an
However:
If no such column exists and you're looking at heaps with 10+ indexes on them ( and -still- ending up doing RID lookups!! ), just cluster the
Ignoring the index sprawl and focusing on just clustering your heaps, using a
In regards to potential page splits using a non-optimal clustering key, while it is a valid concern, if you happen to have access to a Sql Server 2012+ installation, you can follow along with this demonstration, starting with setting up an extended event session to track page splits:
Once that's up and running, you can set up and seed a few tables to get an idea of the impact different kinds of clustering keys under differing fill factors can have on index density and the number of page splits inserts into them can cause. For the example, I'm using a good clustering example with
```
USE tempdb;
GO
IF NOT EXISTS ( SELECT 1
FROM sys.objects
WHERE name = 'PageSplitIdentity'
AND type = 'U' )
BEGIN
--DROP TABLE dbo.PageSplitIdentity;
CREATE TABLE dbo.PageSplitIdentity
(
PageSplitIdentity_PK INTEGER IDENTITY( 1, 1 ) NOT NULL,
Foo VARBINARY( 512 ) NOT NULL
);
ALTER TABLE dbo.PageSplitIdentity
ADD CONSTRAINT PK__PageSplitIdentity
PRIMARY KEY CLUSTERED ( PageSplitIdentity_PK )
WITH ( DATA_COMPRESSION = PAGE, FILLFACTOR = 100 )
ON [PRIMARY];
ALTER TABLE dbo.PageSplitIdentity
ADD CONSTRAINT DF__PageSplitIdentity__Foo
DEFAULT CONVERT( VARBINARY( 512 ), REPLICATE( 0x01, 512 ) ) FOR Foo;
SET NOCOUNT ON;
DECLARE @i INTEGER = 0;
WHILE ( @i < 2500000 )
BEGIN
Honestly, in your situation I'd just go ahead and cluster the [ID]. It is worrying that there appears to be no default on it, but at the same time, maybe if you're lucky, it's being populated with some form of
NEWSEQUENTIALID call. If not, it isn't the end of the world, < 1MM rows shouldn't be a show stopper if the indexes / stats are being maintained.Doing so will allow you to start focusing on reworking some filtered / include indexes and when you're done with that, maybe you'll have 4 sore spots instead of 12.
Any advice beyond that will probably result in "tip of the iceberg" classification for your problem - as you seem to be well aware, the question is really about abysmal design, rather than engine performance.
Update
I couldn't figure out a way to answer your comment clearly, so here's way too much information instead.
As it has been stated, if you have a thin, accessible, incremental columns, like an
IDENTITY column, put a clustered index on that - it's a no-brainer. However:
If no such column exists and you're looking at heaps with 10+ indexes on them ( and -still- ending up doing RID lookups!! ), just cluster the
ID for now, even if it's a UNIQUEIDENTIFIER. You need, more than anything else, to clean up your storage utilization, and that's going to start with clustering -something-, if only to pull that forward fetch data back in line with rest of the row. Even if you immediately drop the clustered index afterwards, you'd at least have organized your heap for now, dramatically reducing forward pointers ( if not completely eliminating ) and ultimately reducing contention on your ( needlessly overworked ) I/O subsystem. There is literally no downside to doing this, so do it. Then you can go about figuring out why there 29 available indexes and queries still need to go RID matching. Ignoring the index sprawl and focusing on just clustering your heaps, using a
FILLFACTOR around 70 should be fine for dealing with UNIQUEIDENTIFIER values. This is because b-trees, the actual data structure SQL Server uses for indexes, tend towards around 69% node utilization under INSERT only operation of random values. This number is a mathematical truth a couple of pretty bright fellas, Lehman and Yao, proved during their research into the subject. Throwing DELETEs in sufficient quantities into the mix can lower the node utilization target, but as a general rule, a FILLFACTOR of 70 is a good place to start tuning non-sequential / random clustered indexes, since you're effectively telling SQL Server to not even bother trying to fill 30% of each page, since you won't be using that part anyway - because you won't be, mathematically.In regards to potential page splits using a non-optimal clustering key, while it is a valid concern, if you happen to have access to a Sql Server 2012+ installation, you can follow along with this demonstration, starting with setting up an extended event session to track page splits:
DROP EVENT SESSION [TrackPageSplits]
ON SERVER;
GO
CREATE EVENT SESSION [TrackPageSplits]
ON SERVER
ADD EVENT sqlserver.transaction_log(
WHERE operation = 11 -- LOP_DELETE_SPLIT
AND database_id = 2 -- Watch TempDB;
)
ADD TARGET package0.histogram(
SET filtering_event_name = 'sqlserver.transaction_log',
source_type = 0, -- Event Column
source = 'alloc_unit_id');
GO
-- Start the Event Session Again
ALTER EVENT SESSION [TrackPageSplits]
ON SERVER
STATE=START;
GOOnce that's up and running, you can set up and seed a few tables to get an idea of the impact different kinds of clustering keys under differing fill factors can have on index density and the number of page splits inserts into them can cause. For the example, I'm using a good clustering example with
INTEGER and IDENTITY under FILL_FACTOR 100, a terrible clustering example with UNIQUEIDENTIFIER and non-sequential NEWID() under FILL_FACTOR 100, and then three more non-sequential UNIQUEIDENTIFIER clustering keys under FILL_FACTORs of 75, 70 and 65, each seeded with 2.5MM records.```
USE tempdb;
GO
IF NOT EXISTS ( SELECT 1
FROM sys.objects
WHERE name = 'PageSplitIdentity'
AND type = 'U' )
BEGIN
--DROP TABLE dbo.PageSplitIdentity;
CREATE TABLE dbo.PageSplitIdentity
(
PageSplitIdentity_PK INTEGER IDENTITY( 1, 1 ) NOT NULL,
Foo VARBINARY( 512 ) NOT NULL
);
ALTER TABLE dbo.PageSplitIdentity
ADD CONSTRAINT PK__PageSplitIdentity
PRIMARY KEY CLUSTERED ( PageSplitIdentity_PK )
WITH ( DATA_COMPRESSION = PAGE, FILLFACTOR = 100 )
ON [PRIMARY];
ALTER TABLE dbo.PageSplitIdentity
ADD CONSTRAINT DF__PageSplitIdentity__Foo
DEFAULT CONVERT( VARBINARY( 512 ), REPLICATE( 0x01, 512 ) ) FOR Foo;
SET NOCOUNT ON;
DECLARE @i INTEGER = 0;
WHILE ( @i < 2500000 )
BEGIN
Code Snippets
DROP EVENT SESSION [TrackPageSplits]
ON SERVER;
GO
CREATE EVENT SESSION [TrackPageSplits]
ON SERVER
ADD EVENT sqlserver.transaction_log(
WHERE operation = 11 -- LOP_DELETE_SPLIT
AND database_id = 2 -- Watch TempDB;
)
ADD TARGET package0.histogram(
SET filtering_event_name = 'sqlserver.transaction_log',
source_type = 0, -- Event Column
source = 'alloc_unit_id');
GO
-- Start the Event Session Again
ALTER EVENT SESSION [TrackPageSplits]
ON SERVER
STATE=START;
GOUSE tempdb;
GO
IF NOT EXISTS ( SELECT 1
FROM sys.objects
WHERE name = 'PageSplitIdentity'
AND type = 'U' )
BEGIN
--DROP TABLE dbo.PageSplitIdentity;
CREATE TABLE dbo.PageSplitIdentity
(
PageSplitIdentity_PK INTEGER IDENTITY( 1, 1 ) NOT NULL,
Foo VARBINARY( 512 ) NOT NULL
);
ALTER TABLE dbo.PageSplitIdentity
ADD CONSTRAINT PK__PageSplitIdentity
PRIMARY KEY CLUSTERED ( PageSplitIdentity_PK )
WITH ( DATA_COMPRESSION = PAGE, FILLFACTOR = 100 )
ON [PRIMARY];
ALTER TABLE dbo.PageSplitIdentity
ADD CONSTRAINT DF__PageSplitIdentity__Foo
DEFAULT CONVERT( VARBINARY( 512 ), REPLICATE( 0x01, 512 ) ) FOR Foo;
SET NOCOUNT ON;
DECLARE @i INTEGER = 0;
WHILE ( @i < 2500000 )
BEGIN
INSERT INTO dbo.PageSplitIdentity DEFAULT VALUES;
SET @i = @i + 1;
END;
SET NOCOUNT OFF;
END;
GO
IF NOT EXISTS ( SELECT 1
FROM sys.objects
WHERE name = 'PageSplitNewID'
AND type = 'U' )
BEGIN
--DROP TABLE dbo.PageSplitNewID;
CREATE TABLE dbo.PageSplitNewID
(
PageSplitNewID_PK UNIQUEIDENTIFIER NOT NULL,
Foo VARBINARY( 512 ) NOT NULL
);
ALTER TABLE dbo.PageSplitNewID
ADD CONSTRAINT PK__PageSplitNewID
PRIMARY KEY CLUSTERED ( PageSplitNewID_PK )
WITH ( DATA_COMPRESSION = PAGE, FILLFACTOR = 100 )
ON [PRIMARY];
ALTER TABLE dbo.PageSplitNewID
ADD CONSTRAINT DF__PageSplitNewID__PageSplitNewID_PK
DEFAULT NEWID() FOR PageSplitNewID_PK;
ALTER TABLE dbo.PageSplitNewID
ADD CONSTRAINT DF__PageSplitNewID__Foo
DEFAULT CONVERT( VARBINARY( 512 ), REPLICATE( 0x01, 512 ) ) FOR Foo;
SET NOCOUNT ON;
DECLARE @i INTEGER = 0;
WHILE ( @i < 2500000 )
BEGIN
INSERT INTO dbo.PageSplitNewID DEFAULT VALUES;
SET @i = @i + 1;
END;
SET NOCOUNT OFF;
END;
GO
IF NOT EXISTS ( SELECT 1
FROM sys.objects
WHERE name = 'PageSplitNewIDFillFactor75'
AND type = 'U' )
BEGIN
--DROP TABLE dbo.PageSplitNewIDFillFactor75;
CREATE TABLE dbo.PageSplitNewIDFillFactor75
(
PageSplitNewIDFillFactor75_PK
UNIQUEIDENTIFIER NOT NULL,
Foo VARBINARY( 512 ) NOT NULL
);
ALTER TABLE dbo.PageSplitNewIDFillFactor75
ADD CONSTRAINT PK__PageSplitNewIDFillFactor75
PRIMARY KEY CLUSTERED ( PageSplitNewIDFillFactor75_PK )
WITH ( DATA_COMPRESSION = PAGE, FILLFACTOR = 75 )
ON [PRIMARY];
ALTER TABLE dbo.PageSplitNewIDFillFactor75
ADD CONSTRAINT DF__PageSplitNewIDFillFactor75__PageSplitNewIDFillFactor75_PK
DEFAULT NEWID() FOR PageSplitNewIDFillFactor75_PK;
ALTER TABLE dbo.PageSplitNewIDFillFactor75
ADD CONSTRAINT DF__PageSplitSELECT si.name, sips.avg_fragment_size_in_pages,
sips.page_count, sips.fragment_count,
sips.avg_fragmentation_in_percent,
sips.avg_page_space_used_in_percent
FROM sys.indexes si
CROSS APPLY sys.dm_db_index_physical_stats( DB_ID(), si.object_id, si.index_id, DEFAULT, 'DETAILED' ) sips
WHERE si.name IN ( 'PK__PageSplitIdentity', 'PK__PageSplitNewID',
'PK__PageSplitNewIDFillFactor75', 'PK__PageSplitNewIDFillFactor70', 'PK__PageSplitNewIDFillFactor65' )
AND sips.index_level = 0;SELECT si.name, sc.split_count, si.fill_factor
FROM ( SELECT allocation_unit_id = slot.value( '( value )[ 1 ]', 'BIGINT' ),
split_count = slot.value( '( @count )[ 1 ]', 'BIGINT' )
FROM ( SELECT target_data = CONVERT( XML, target_data )
FROM sys.dm_xe_sessions xs
INNER JOIN sys.dm_xe_session_targets xst
ON xs.address = xst.event_session_address
WHERE xs.name = 'TrackPageSplits'
AND xst.target_name = 'histogram' ) s
CROSS APPLY s.target_data.nodes( 'HistogramTarget/Slot' ) n ( slot ) ) sc
INNER JOIN sys.allocation_units sau
ON sc.allocation_unit_id = sau.allocation_unit_id
INNER JOIN sys.partitions sp
ON sau.container_id = sp.partition_id
INNER JOIN sys.indexes si
ON sp.object_id = si.object_id
AND sp.index_id = si.index_id;ALTER INDEX PK__PageSplitIdentity
ON dbo.PageSplitIdentity REBUILD;
ALTER INDEX PK__PageSplitNewID
ON dbo.PageSplitNewID REBUILD;
ALTER INDEX PK__PageSplitNewIDFillFactor75
ON dbo.PageSplitNewIDFillFactor75 REBUILD;
ALTER INDEX PK__PageSplitNewIDFillFactor70
ON dbo.PageSplitNewIDFillFactor70 REBUILD;
ALTER INDEX PK__PageSplitNewIDFillFactor65
ON dbo.PageSplitNewIDFillFactor65 REBUILD;Context
StackExchange Database Administrators Q#117180, answer score: 3
Revisions (0)
No revisions yet.