HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Clustered index not used in a delete statement

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
clustereddeletestatementusedindexnot

Problem

I have a SQL Server table defined as follow

CREATE TABLE [dbo].[Production_Detail]
(
    [Id] [bigint] NOT NULL DEFAULT (NEXT VALUE FOR [dbo].[Production_Detail_Seq]),
    [Meta_Data_ID] INT NOT NULL , 
    [Production_Detail_Time] DATETIME NOT NULL,
    [Production_Detail_Time_Local] DATETIME NOT NULL,
    [Production_Detail_Value] FLOAT NULL,
    [IntegratedDM] BIT NOT NULL DEFAULT 0,
    [DailyIntegratedDM] BIT NOT NULL DEFAULT 0,
    [InsertedDate] DateTime NOT NULL,
    [ModifiedDate] DateTime NOT NULL,  

    CONSTRAINT [PK_Production_Detail] PRIMARY KEY ([Id]), 

    CONSTRAINT [UK_Production_Detail] UNIQUE ([Meta_Data_Id], [Production_Detail_Time]),

    CONSTRAINT [FK_Production_Detail_To_Meta_Data] FOREIGN KEY ([Meta_Data_ID]) REFERENCES [Meta_Data]([Meta_Data_ID]) 
)
GO

CREATE NONCLUSTERED INDEX [IX_Production_Detail_Production_Detail_Time_Local]
ON [dbo].[Production_Detail] ([Production_Detail_Time_Local])
INCLUDE ([Meta_Data_ID], [Production_Detail_Time], [Production_Detail_Value])
GO

CREATE UNIQUE NONCLUSTERED INDEX [IX_Production_Detail_Meta_Data_ID_Production_Detail_Time] 
ON [dbo].[Production_Detail] ([Meta_Data_ID] ASC, [Production_Detail_Time] ASC)
GO

CREATE NONCLUSTERED INDEX [IX_Production_Detail_IntegratedDM_MetaDataId]
ON [dbo].[Production_Detail] ([IntegratedDM], [Meta_Data_ID])
INCLUDE ([Production_Detail_Time])
GO


This table contains real time data, so records are consistently inserted. I'm trying to delete data that are older than 5 days. But when the delete is running, I have several insert statement that throws deadlock.

Here's the delete query. The #ToPurge table contains the ID that needs to be deleted. The #ToPurge table doesn't have any indexes. The number of rows may vary, but for my test case it had over 3 million:

```
Declare @RowCount int;
set @RowCount = @@ROWCOUNT;

while @RowCount > 0
begin
delete top(20000) pd
from #ToPurge tp
inner join [DataWarehouse].[dbo].[Production_Detail] pd on (tp.ID =

Solution

I'd like to know why the optimizer does not use the clustered index,
but is using the non-clustered one?

This will be a decision of the cost based optimizer. It estimates that it is cheaper to fully scan the narrow index.

It seems that you were expecting a nested loops with seeks on the clustered index?

The execution plan shows that the table #ToPurge has 3.9 million rows.

This will take 195 batches to clear out. Assuming that all rows in #ToPurge are unique and initially exist in Production_Detail and that the rows output from #ToPurge are in a consistent order for each batch (I'm ignoring the parallelism) then each successive batch will end up seeking all rows that have already been deleted in previous batches before finally getting to a patch of rows that have not yet been processed then in total your delete process will do more than a third of a billion index seeks for already deleted rows.

0 + 20,000 + 40,000 + ... + 3,880,000 = 378,300,000


This isn't the logic that the cost based optimizer uses (which will involve histograms of the two tables and a rowgoal for the TOP 20000) but it shows this is not a good strategy anyway.

You could try the following to cap the number of seeks at 20,000 per iteration and see if you now get your desired plan.

CREATE TABLE #ToPurge
  (
     [BatchId] [INT] NULL,
     [Id]      [BIGINT] NOT NULL,
     UNIQUE CLUSTERED([BatchId], [Id])
  );

/*Your insert statement to #ToPurge here*/

WITH CTE
     AS (SELECT *,
                1 + ( ROW_NUMBER() OVER (ORDER BY Id) / 20000 ) AS _BatchId
         FROM   #ToPurge)
UPDATE CTE
SET    [BatchId] = [_BatchId];

DECLARE @BatchCounter INT = 1;

WHILE @BatchCounter <= (SELECT MAX([BatchId])
                        FROM   #ToPurge)
  BEGIN
      DELETE pd
      FROM   #ToPurge tp
             INNER JOIN [DataWarehouse].[dbo].[Production_Detail] pd
               ON ( tp.ID = pd.ID )
      WHERE  tp.BatchId = @BatchCounter

      SET @BatchCounter+=1;
  END

Code Snippets

0 + 20,000 + 40,000 + ... + 3,880,000 = 378,300,000
CREATE TABLE #ToPurge
  (
     [BatchId] [INT] NULL,
     [Id]      [BIGINT] NOT NULL,
     UNIQUE CLUSTERED([BatchId], [Id])
  );

/*Your insert statement to #ToPurge here*/

WITH CTE
     AS (SELECT *,
                1 + ( ROW_NUMBER() OVER (ORDER BY Id) / 20000 ) AS _BatchId
         FROM   #ToPurge)
UPDATE CTE
SET    [BatchId] = [_BatchId];

DECLARE @BatchCounter INT = 1;

WHILE @BatchCounter <= (SELECT MAX([BatchId])
                        FROM   #ToPurge)
  BEGIN
      DELETE pd
      FROM   #ToPurge tp
             INNER JOIN [DataWarehouse].[dbo].[Production_Detail] pd
               ON ( tp.ID = pd.ID )
      WHERE  tp.BatchId = @BatchCounter

      SET @BatchCounter+=1;
  END

Context

StackExchange Database Administrators Q#141542, answer score: 10

Revisions (0)

No revisions yet.