patternsqlModerate
Clustered index not used in a delete statement
Viewed 0 times
clustereddeletestatementusedindexnot
Problem
I have a SQL Server table defined as follow
This table contains real time data, so records are consistently inserted. I'm trying to delete data that are older than 5 days. But when the delete is running, I have several insert statement that throws deadlock.
Here's the delete query. The
```
Declare @RowCount int;
set @RowCount = @@ROWCOUNT;
while @RowCount > 0
begin
delete top(20000) pd
from #ToPurge tp
inner join [DataWarehouse].[dbo].[Production_Detail] pd on (tp.ID =
CREATE TABLE [dbo].[Production_Detail]
(
[Id] [bigint] NOT NULL DEFAULT (NEXT VALUE FOR [dbo].[Production_Detail_Seq]),
[Meta_Data_ID] INT NOT NULL ,
[Production_Detail_Time] DATETIME NOT NULL,
[Production_Detail_Time_Local] DATETIME NOT NULL,
[Production_Detail_Value] FLOAT NULL,
[IntegratedDM] BIT NOT NULL DEFAULT 0,
[DailyIntegratedDM] BIT NOT NULL DEFAULT 0,
[InsertedDate] DateTime NOT NULL,
[ModifiedDate] DateTime NOT NULL,
CONSTRAINT [PK_Production_Detail] PRIMARY KEY ([Id]),
CONSTRAINT [UK_Production_Detail] UNIQUE ([Meta_Data_Id], [Production_Detail_Time]),
CONSTRAINT [FK_Production_Detail_To_Meta_Data] FOREIGN KEY ([Meta_Data_ID]) REFERENCES [Meta_Data]([Meta_Data_ID])
)
GO
CREATE NONCLUSTERED INDEX [IX_Production_Detail_Production_Detail_Time_Local]
ON [dbo].[Production_Detail] ([Production_Detail_Time_Local])
INCLUDE ([Meta_Data_ID], [Production_Detail_Time], [Production_Detail_Value])
GO
CREATE UNIQUE NONCLUSTERED INDEX [IX_Production_Detail_Meta_Data_ID_Production_Detail_Time]
ON [dbo].[Production_Detail] ([Meta_Data_ID] ASC, [Production_Detail_Time] ASC)
GO
CREATE NONCLUSTERED INDEX [IX_Production_Detail_IntegratedDM_MetaDataId]
ON [dbo].[Production_Detail] ([IntegratedDM], [Meta_Data_ID])
INCLUDE ([Production_Detail_Time])
GOThis table contains real time data, so records are consistently inserted. I'm trying to delete data that are older than 5 days. But when the delete is running, I have several insert statement that throws deadlock.
Here's the delete query. The
#ToPurge table contains the ID that needs to be deleted. The #ToPurge table doesn't have any indexes. The number of rows may vary, but for my test case it had over 3 million:```
Declare @RowCount int;
set @RowCount = @@ROWCOUNT;
while @RowCount > 0
begin
delete top(20000) pd
from #ToPurge tp
inner join [DataWarehouse].[dbo].[Production_Detail] pd on (tp.ID =
Solution
I'd like to know why the optimizer does not use the clustered index,
but is using the non-clustered one?
This will be a decision of the cost based optimizer. It estimates that it is cheaper to fully scan the narrow index.
It seems that you were expecting a nested loops with seeks on the clustered index?
The execution plan shows that the table
This will take 195 batches to clear out. Assuming that all rows in
This isn't the logic that the cost based optimizer uses (which will involve histograms of the two tables and a rowgoal for the
You could try the following to cap the number of seeks at 20,000 per iteration and see if you now get your desired plan.
but is using the non-clustered one?
This will be a decision of the cost based optimizer. It estimates that it is cheaper to fully scan the narrow index.
It seems that you were expecting a nested loops with seeks on the clustered index?
The execution plan shows that the table
#ToPurge has 3.9 million rows.This will take 195 batches to clear out. Assuming that all rows in
#ToPurge are unique and initially exist in Production_Detail and that the rows output from #ToPurge are in a consistent order for each batch (I'm ignoring the parallelism) then each successive batch will end up seeking all rows that have already been deleted in previous batches before finally getting to a patch of rows that have not yet been processed then in total your delete process will do more than a third of a billion index seeks for already deleted rows.0 + 20,000 + 40,000 + ... + 3,880,000 = 378,300,000This isn't the logic that the cost based optimizer uses (which will involve histograms of the two tables and a rowgoal for the
TOP 20000) but it shows this is not a good strategy anyway.You could try the following to cap the number of seeks at 20,000 per iteration and see if you now get your desired plan.
CREATE TABLE #ToPurge
(
[BatchId] [INT] NULL,
[Id] [BIGINT] NOT NULL,
UNIQUE CLUSTERED([BatchId], [Id])
);
/*Your insert statement to #ToPurge here*/
WITH CTE
AS (SELECT *,
1 + ( ROW_NUMBER() OVER (ORDER BY Id) / 20000 ) AS _BatchId
FROM #ToPurge)
UPDATE CTE
SET [BatchId] = [_BatchId];
DECLARE @BatchCounter INT = 1;
WHILE @BatchCounter <= (SELECT MAX([BatchId])
FROM #ToPurge)
BEGIN
DELETE pd
FROM #ToPurge tp
INNER JOIN [DataWarehouse].[dbo].[Production_Detail] pd
ON ( tp.ID = pd.ID )
WHERE tp.BatchId = @BatchCounter
SET @BatchCounter+=1;
ENDCode Snippets
0 + 20,000 + 40,000 + ... + 3,880,000 = 378,300,000CREATE TABLE #ToPurge
(
[BatchId] [INT] NULL,
[Id] [BIGINT] NOT NULL,
UNIQUE CLUSTERED([BatchId], [Id])
);
/*Your insert statement to #ToPurge here*/
WITH CTE
AS (SELECT *,
1 + ( ROW_NUMBER() OVER (ORDER BY Id) / 20000 ) AS _BatchId
FROM #ToPurge)
UPDATE CTE
SET [BatchId] = [_BatchId];
DECLARE @BatchCounter INT = 1;
WHILE @BatchCounter <= (SELECT MAX([BatchId])
FROM #ToPurge)
BEGIN
DELETE pd
FROM #ToPurge tp
INNER JOIN [DataWarehouse].[dbo].[Production_Detail] pd
ON ( tp.ID = pd.ID )
WHERE tp.BatchId = @BatchCounter
SET @BatchCounter+=1;
ENDContext
StackExchange Database Administrators Q#141542, answer score: 10
Revisions (0)
No revisions yet.