debugsqlModerate
Workaround for Anti-Semi Join bug
Viewed 0 times
bugsemiantijoinforworkaround
Problem
I built the following SQL Server query, but it is encountering the anti-semi join defect in SQL Server 2005 which results in inaccurate cardinality estimates (1 -- urgh!) and runs forever. Since it is a longstanding production SQL Server I can't easily suggest to upgrade versions, and as such I cannot force the traceflag 4199 hint on this specific query.
I'm having a hard time refactoring the
Note that the
Any help would be appreciated!
I'm having a hard time refactoring the
WHERE AND NOT IN (SELECT). Can anyone care to help? I've made sure to try and use the best joins based on clustered key pairs.SELECT TOP 5000 d.doc2_id
,d.direction_cd
,a.address_type_cd
,d.external_identification
,s.hash_value
,d.publishdate
,d.sender_address_id AS [D2 Sender_Address_id]
,a.address_id AS [A Address_ID]
,d.message_size
,d.subject
,emi.employee_id
FROM assentor.emcsdbuser.doc2 d(NOLOCK)
INNER JOIN assentor.emcsdbuser.employee_msg_index emi(NOLOCK)
ON d.processdate = emi.processdate
AND d.doc2_id = emi.doc2_id
INNER LOOP JOIN assentor.emcsdbuser.doc2_address a(NOLOCK)
ON emi.doc2_id = a.doc2_id
AND emi.address_type_cd = a.address_type_cd
AND emi.address_id = a.address_id
INNER JOIN sis.dbo.sis s(NOLOCK) ON d.external_identification = s.external_identification
WHERE d.publishdate > '2008-01-01'
**AND d.doc2_id NOT IN (
SELECT doc2_id
FROM assentor.emcsdbuser.doc2_address d2a(NOLOCK)
WHERE d.doc2_id = d2a.doc2_id
AND d2a.address_type_cd = 'FRM'
)**
OPTION (FAST 10)Note that the
Employee_MSG_Index table is 500m rows, doc2 is 1.5b rows, SIS is ~500m rows.Any help would be appreciated!
Solution
Since it is a longstanding production SQL Server I can't easily suggest to upgrade versions
The anti semi join cardinality estimation bug is reproducible on all versions of SQL Server from 2005 to 2012 inclusive. All require trace flag 4199 to enable the fix, so upgrading would not solve your problem without activating 4199 (though there are many other good reasons to upgrade from 2005, of course).
...as such I cannot force the traceflag 4199 hint on this specific query.
If it is just one particular query that is affected, you can use
This hint effectively runs
You should also look at testing your whole system with 4199 enabled instance-wide. Plan regressions are possible, but overall you may find that the various optimizer fixes enabled by this flag are well worth it. All future plan-affecting query processor fixes require this flag to activate.
All that said...
As mentioned in ypercube's answer, the bug requires two or more join columns to manifest (among many details). The redundancy in your
Removing this redundancy will 'solve' the problem for this particular query, though other queries that really do have more than one join predicate will still be vulnerable.
Example
To illustrate, here is an example based on the CSS blog post linked in the question (but with a complete script!):
Sample data:
Test query using
The estimated execution plan shows an estimate of 1 row after the anti semi join:
Side note: In fact this is an example of another (rare) bug. Writing the
The same query with
The estimated execution plan now shows an estimate of 2046 rows, which is exactly right:
We can also remove the redundant predicate:
The execution plan happens to use an additional unrelated optimization (the Stream Aggregate), but the important point is that the post-join estimate is correct without having to enable 4199:
Multiple anti semi join columns
It is possible to express an anti semi join over multiple columns using
The execution plan shows the erroneous 1-row estimate:
With 4199, the issue is resolved:
Other syntaxes
Using
That issue with
That two column anti semi join produces the 1-r
The anti semi join cardinality estimation bug is reproducible on all versions of SQL Server from 2005 to 2012 inclusive. All require trace flag 4199 to enable the fix, so upgrading would not solve your problem without activating 4199 (though there are many other good reasons to upgrade from 2005, of course).
...as such I cannot force the traceflag 4199 hint on this specific query.
If it is just one particular query that is affected, you can use
OPTION (QUERYTRACEON 4199) to enable the trace flag for just that query. This query hint is documented and supported for use with 4199, and applies from SQL Server 2005 Service Pack 2 onward.This hint effectively runs
DBCC TRACEON (4199) and DBCC TRACEOFF (4199) around the query, and requires sysadmin permission as a result. If that is a problem, add the hint using a plan guide.You should also look at testing your whole system with 4199 enabled instance-wide. Plan regressions are possible, but overall you may find that the various optimizer fixes enabled by this flag are well worth it. All future plan-affecting query processor fixes require this flag to activate.
All that said...
As mentioned in ypercube's answer, the bug requires two or more join columns to manifest (among many details). The redundancy in your
NOT IN clause causes the optimizer to see two column comparisons (though logically there is only one), thereby exposing the bug.Removing this redundancy will 'solve' the problem for this particular query, though other queries that really do have more than one join predicate will still be vulnerable.
Example
To illustrate, here is an example based on the CSS blog post linked in the question (but with a complete script!):
CREATE TABLE dbo.tst_TAB1
(
c1 integer NOT NULL,
c2 integer NOT NULL,
c3 integer NOT NULL
);
CREATE TABLE dbo.tst_TAB2
(
c1 integer NOT NULL,
c2 integer NOT NULL,
c3 integer NOT NULL
);
CREATE INDEX i ON dbo.tst_TAB1 (c1, c2);
CREATE INDEX i ON dbo.tst_TAB2 (c1, c2);Sample data:
INSERT dbo.tst_TAB1
(c1, c2, c3)
SELECT
number, number, number
FROM master.dbo.spt_values
WHERE
[type] = N'P'
AND number BETWEEN 1 AND 2047;
INSERT dbo.tst_TAB2 (c1, c2, c3)
VALUES (1, 1, 1);Test query using
NOT IN with redundant predicate:SELECT
T1.c1
FROM tst_TAB1 AS t1
WHERE
t1.c1 NOT IN
(
SELECT
t2.c1
FROM tst_TAB2 AS t2
-- This is redundant!
WHERE
t2.c1 = t1.c1
);The estimated execution plan shows an estimate of 1 row after the anti semi join:
Side note: In fact this is an example of another (rare) bug. Writing the
WHERE clause as t1.c1 = t2.c1 instead of t2.c1 = t1.c1 allows the optimizer to see that the two join predicates are in fact the same, and the bug does not manifest.The same query with
OPTION (QUERYTRACEON 4199):SELECT
T1.c1
FROM tst_TAB1 AS t1
WHERE
t1.c1 NOT IN
(
SELECT
t2.c1
FROM tst_TAB2 AS t2
WHERE
t2.c1 = t1.c1
)
OPTION (QUERYTRACEON 4199);The estimated execution plan now shows an estimate of 2046 rows, which is exactly right:
We can also remove the redundant predicate:
SELECT
T1.c1
FROM tst_TAB1 AS t1
WHERE
t1.c1 NOT IN
(
SELECT
t2.c1
FROM tst_TAB2 AS t2
);The execution plan happens to use an additional unrelated optimization (the Stream Aggregate), but the important point is that the post-join estimate is correct without having to enable 4199:
Multiple anti semi join columns
It is possible to express an anti semi join over multiple columns using
NOT IN syntax. These cases will require 4199. For example, the next query joins on c1 and c2:SELECT
T1.c1
FROM tst_TAB1 AS t1
WHERE
t1.c1 NOT IN
(
SELECT
t2.c1
FROM tst_TAB2 AS t2
WHERE
t2.c2 = t1.c2
);The execution plan shows the erroneous 1-row estimate:
With 4199, the issue is resolved:
SELECT
T1.c1
FROM tst_TAB1 AS t1
WHERE
t1.c1 NOT IN
(
SELECT
t2.c1
FROM tst_TAB2 AS t2
WHERE
t2.c2 = t1.c2
)
OPTION (QUERYTRACEON 4199);Other syntaxes
Using
NOT IN in this way is best avoided, not least for the reasons mentioned in Books Online:That issue with
NOT IN and NULLs has been written about many times. There are many alternative syntaxes available, of which NOT EXISTS is my personal preference. Note that changing syntax will not avoid the cardinality estimation bug:SELECT
T1.c1
FROM dbo.tst_TAB1 AS t1
WHERE
NOT EXISTS
(
SELECT 1
FROM dbo.tst_TAB2 AS t2
WHERE
t2.c1 = t1.c1
AND t2.c2 = t1.c2
);That two column anti semi join produces the 1-r
Code Snippets
CREATE TABLE dbo.tst_TAB1
(
c1 integer NOT NULL,
c2 integer NOT NULL,
c3 integer NOT NULL
);
CREATE TABLE dbo.tst_TAB2
(
c1 integer NOT NULL,
c2 integer NOT NULL,
c3 integer NOT NULL
);
CREATE INDEX i ON dbo.tst_TAB1 (c1, c2);
CREATE INDEX i ON dbo.tst_TAB2 (c1, c2);INSERT dbo.tst_TAB1
(c1, c2, c3)
SELECT
number, number, number
FROM master.dbo.spt_values
WHERE
[type] = N'P'
AND number BETWEEN 1 AND 2047;
INSERT dbo.tst_TAB2 (c1, c2, c3)
VALUES (1, 1, 1);SELECT
T1.c1
FROM tst_TAB1 AS t1
WHERE
t1.c1 NOT IN
(
SELECT
t2.c1
FROM tst_TAB2 AS t2
-- This is redundant!
WHERE
t2.c1 = t1.c1
);SELECT
T1.c1
FROM tst_TAB1 AS t1
WHERE
t1.c1 NOT IN
(
SELECT
t2.c1
FROM tst_TAB2 AS t2
WHERE
t2.c1 = t1.c1
)
OPTION (QUERYTRACEON 4199);SELECT
T1.c1
FROM tst_TAB1 AS t1
WHERE
t1.c1 NOT IN
(
SELECT
t2.c1
FROM tst_TAB2 AS t2
);Context
StackExchange Database Administrators Q#47127, answer score: 14
Revisions (0)
No revisions yet.