HiveBrain v1.2.0
Get Started
← Back to all entries
gotchasqlMajor

Why does a DELETE query run in one format much longer than in another?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
whyformatmuchdeletelongerquerythanoneanotherdoes

Problem

I have specific cleanup code that tries to remove some duplicates.

This runs perfectly on many customer sites. The logs tell me that at least 1 sec up to 45 sec is consumed by this query:

DELETE FROM [tbl]
WHERE [Id] NOT IN
(
    SELECT MIN([Id])
    FROM [tbl]
    GROUP BY [IdProject], [IdRepresentative], [TimeStart]
)


But I have a customer where this query runs for more than 4 hours (up to now and not ending)! I checked the DB (DBCC CHECKDB), I already update the statistics (sp_updatestats), also UPDATE STATISTICS [tbl] WITH FULLSCAN shows no change.

I have the original backup of DB from the customer. I run it on an SQL Server 14.0.2002.14. I have Standard Edition, the customer uses the Express Edition.

I can see in activity monitor that no one else is using the DB. There are no waits and the CPU is used by 25% (exactly 1 of my 4 CPUs). Also in this my test case no one else is using the DB.

I reformed the query and checked this statement:

DELETE FROM [tbl]
FROM [tbl] AS t
LEFT OUTER JOIN 
    (
        SELECT MIN([Id]) AS [IdMin]
        FROM [tbl]
        GROUP BY [IdProject], [IdRepresentative], [TimeStart]
    ) AS d ON d.[IdMin]=t.[Id]
WHERE d.[IdMin] IS NULL


This statement executes in just 1-4 seconds on the same DB.

What can I do with the table or the SQL DB to speed it up?

For me it seems to be a specific problem with DB situation/SQL Server Version. We have never seen this behavior on nearly 100 other sites.

The question is not about discussing that the second DELETE with JOIN style is better. I know this. But we have this other code currently in the production and I can't change it on the fly, but I want to make the customer happy.

Id is not nullable. It is a primary clustered ID. Creating an index is not an option. Because I can't influence the current running system. Something must be physically different.

There are definitely no locks! I just used a stand alone machine with the backup of the DB. And I just execu

Solution

This part of the plan is the problem.

Issue

The correct behaviour if the subquery brings back any NULL is to return 0 rows from the NOT IN.

Even if ID is not nullable (and therefore MIN(ID) cannot possibly be NULL when used as a vector aggregate) the datatype of MIN(ID) is regarded as nullable (it can still return NULL when used as a scalar aggregate against an empty table for example).

So you have this extra row count spool added to the plan whose job it is to ensure (in conjunction with an anti semi join) that no rows are emitted if a NULL is returned by the subquery.

Unfortunately even though the rows eliminated by the anti semi join on this spool are likely to be 0 and all 257,246 rows will flow onto the next operator the cardinality estimation reduces the estimated number of rows going past that step to 1.

As a result it has a scan of the table on the inside of the nested loops with estimated 1 execution whereas in reality it will scan and aggregate the whole table 257,246 times.

The one-row estimate coming out of the Anti Semi join is a known bug that was fixed under trace flag 4199 quite a while ago now. See the related Q & A Workaround for Anti-Semi Join bug for some more background and links.

Solution

The bug only manifests on SQL Server 2017 for you because you have compatibility level 120 selected.

You should find you get a much better estimate for the Anti Semi Join with trace flag 4199 active, an OPTION (QUERYTRACEON 4199) hint, a OPTION (USE HINT ('ENABLE_QUERY_OPTIMIZER_HOTFIXES')) hint (directly or via plan guide) or for the database:

ALTER DATABASE SCOPED CONFIGURATION
SET QUERY_OPTIMIZER_HOTFIXES = ON;


The use hint QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_140 is another option as of SQL Server 2017 CU10.

Which option you choose depends on how widely you want the optimizer hotfixes to apply. Compatibility levels are intended to be rather short-term things, so you should be planning to move to a more current setting, where this particular optimizer fix is on by default.

Repro

The following script reproduces the problem and a fix:

ALTER DATABASE CURRENT 
SET COMPATIBILITY_LEVEL = 120;
GO
ALTER DATABASE SCOPED CONFIGURATION
SET QUERY_OPTIMIZER_HOTFIXES = OFF;
GO
DROP TABLE IF EXISTS dbo.tbl;
GO
CREATE TABLE dbo.tbl
(
    Id integer PRIMARY KEY, 
    IdProject integer NOT NULL, 
    IdRepresentative integer NOT NULL, 
    TimeStart datetime NOT NULL,

    INDEX i NONCLUSTERED
    (
        TimeStart, 
        IdRepresentative, 
        IdProject
    )
);
GO
UPDATE STATISTICS dbo.tbl 
WITH 
    ROWCOUNT = 257246, 
    PAGECOUNT = 25725;


DELETE FROM [tbl]
WHERE [Id] NOT IN
(
    SELECT MIN([Id])
    FROM [tbl]
    GROUP BY [IdProject], [IdRepresentative], [TimeStart]
) 
OPTION 
(
    MAXDOP 1
);


DELETE FROM [tbl]
WHERE [Id] NOT IN
(
    SELECT MIN([Id])
    FROM [tbl]
    GROUP BY [IdProject], [IdRepresentative], [TimeStart]
) 
OPTION 
(
    MAXDOP 1,
    USE HINT ('ENABLE_QUERY_OPTIMIZER_HOTFIXES')
);


Alternative Syntax

Ideally you should also rewrite the query to not use the problematic NOT IN. A possible alternative, that is likely to be more efficient even with the above fix, would be

DELETE T
FROM   (SELECT ROW_NUMBER() OVER (PARTITION BY IdProject, IdRepresentative, TimeStart ORDER BY Id) AS RN
        FROM   tbl) T
WHERE  RN > 1

Code Snippets

ALTER DATABASE SCOPED CONFIGURATION
SET QUERY_OPTIMIZER_HOTFIXES = ON;
ALTER DATABASE CURRENT 
SET COMPATIBILITY_LEVEL = 120;
GO
ALTER DATABASE SCOPED CONFIGURATION
SET QUERY_OPTIMIZER_HOTFIXES = OFF;
GO
DROP TABLE IF EXISTS dbo.tbl;
GO
CREATE TABLE dbo.tbl
(
    Id integer PRIMARY KEY, 
    IdProject integer NOT NULL, 
    IdRepresentative integer NOT NULL, 
    TimeStart datetime NOT NULL,

    INDEX i NONCLUSTERED
    (
        TimeStart, 
        IdRepresentative, 
        IdProject
    )
);
GO
UPDATE STATISTICS dbo.tbl 
WITH 
    ROWCOUNT = 257246, 
    PAGECOUNT = 25725;
DELETE FROM [tbl]
WHERE [Id] NOT IN
(
    SELECT MIN([Id])
    FROM [tbl]
    GROUP BY [IdProject], [IdRepresentative], [TimeStart]
) 
OPTION 
(
    MAXDOP 1
);
DELETE FROM [tbl]
WHERE [Id] NOT IN
(
    SELECT MIN([Id])
    FROM [tbl]
    GROUP BY [IdProject], [IdRepresentative], [TimeStart]
) 
OPTION 
(
    MAXDOP 1,
    USE HINT ('ENABLE_QUERY_OPTIMIZER_HOTFIXES')
);
DELETE T
FROM   (SELECT ROW_NUMBER() OVER (PARTITION BY IdProject, IdRepresentative, TimeStart ORDER BY Id) AS RN
        FROM   tbl) T
WHERE  RN > 1

Context

StackExchange Database Administrators Q#228938, answer score: 24

Revisions (0)

No revisions yet.