patternsqlMajor
Wrapping query in IF EXISTS makes it very slow
Viewed 0 times
existsquerymakesslowwrappingvery
Problem
I have the below query :
The above query completes in three seconds.
If the above query returns any value, we want the stored procedure to EXIT,so I rewrote it like below:
However this is taking 10 minutes.
I can rewrite the above query like below, which also completes in less than 3 seconds:
The issue with above rewrite is that the above query is part of bigger stored procedure and it returns multiple result sets. In C#,we iterate through each result set and do some processing.
The above returns an empty result set, so if I go with this approach, I have to change my C# and do deployment again.
So my question is,
why does using just
Below are the details which may help you and let me know if you need any details:
-
Fast Execution plan
Slow plan using Brentozar Paste the plan
Fast Plan using Brentozar Paste the plan
Note: Both queries are the same (using parameters), the only difference is
The table creation scripts are below:
http://pastebin.com/CgSHeqXc -- small table stats
http://pastebin.com/GUu9KfpS -- big table stats
select databasename
from somedb.dbo.bigtable l where databasename ='someval' and source <>'kt'
and not exists(select 1 from dbo.smalltable c where c.source=l.source)The above query completes in three seconds.
If the above query returns any value, we want the stored procedure to EXIT,so I rewrote it like below:
If Exists(
select databasename
from somedb.dbo.bigtable l where databasename ='someval' and source <>'kt'
and not exists(select 1 from dbo.smalltable c where c.source=l.source)
)
Begin
Raiserror('Source missing',16,1)
Return
EndHowever this is taking 10 minutes.
I can rewrite the above query like below, which also completes in less than 3 seconds:
select databasename
from somedb.dbo.bigtable l where databasename ='someval' and source <>'kt'
and not exists(select 1 from dbo.smalltable c where c.source=l.source
if @@rowcount >0
Begin
Raiserror('Source missing',16,1)
Return
EndThe issue with above rewrite is that the above query is part of bigger stored procedure and it returns multiple result sets. In C#,we iterate through each result set and do some processing.
The above returns an empty result set, so if I go with this approach, I have to change my C# and do deployment again.
So my question is,
why does using just
IF EXISTS changes the plan to take so much time?Below are the details which may help you and let me know if you need any details:
- Create table and statistics script to get same plan as mine
- Slow Execution plan
-
Fast Execution plan
Slow plan using Brentozar Paste the plan
Fast Plan using Brentozar Paste the plan
Note: Both queries are the same (using parameters), the only difference is
EXISTS (I might have made some mistakes while anonymizing though).The table creation scripts are below:
http://pastebin.com/CgSHeqXc -- small table stats
http://pastebin.com/GUu9KfpS -- big table stats
Solution
As has been explained by Paul White in his blog post: Inside the Optimizer: Row Goals In Depth the
As a final example, consider that a logical semi-join (such as a
sub-query introduced with EXISTS) shares the overall theme: it should
be optimised to find the first matching row quickly.
In your query this apparently happens to introduce nested loops and remove parallelism, resulting in a slower plan.
So you would probably need to find a way to rewrite your query without using the
You might get away with rewriting your query using a
You could probably use a
Mind you, Aaron Bertrand has a blog post providing reasons why he prefers NOT EXISTS which you should read through to see if other approaches work better, and to be aware of the potential correctness issues in case of NULL values.
Related Q & A: IF EXISTS taking longer than embedded select statement
EXISTS introduces a row goal, which prefers NESTED LOOPS or MERGE JOIN over HASH MATCHAs a final example, consider that a logical semi-join (such as a
sub-query introduced with EXISTS) shares the overall theme: it should
be optimised to find the first matching row quickly.
In your query this apparently happens to introduce nested loops and remove parallelism, resulting in a slower plan.
So you would probably need to find a way to rewrite your query without using the
NOT EXISTS from your query.You might get away with rewriting your query using a
LEFT OUTER JOIN and checking there wasn't a row in smalltable by testing for NULLIf EXISTS(
SELECT databasename
FROM somedb.dbo.bigtable l
LEFT JOIN dbo.smalltable c ON c.source = l.source
WHERE databasename = 'someval'
AND source <> 'kt'
AND c.source IS NULL
)You could probably use a
EXCEPT query too, depending on how many fields you need to compare on like this:If EXISTS(
SELECT source
FROM somedb.dbo.bigtable l
WHERE databasename = 'someval'
AND source <> 'kt'
EXCEPT
SELECT source
FROM dbo.smalltable
)Mind you, Aaron Bertrand has a blog post providing reasons why he prefers NOT EXISTS which you should read through to see if other approaches work better, and to be aware of the potential correctness issues in case of NULL values.
Related Q & A: IF EXISTS taking longer than embedded select statement
Code Snippets
If EXISTS(
SELECT databasename
FROM somedb.dbo.bigtable l
LEFT JOIN dbo.smalltable c ON c.source = l.source
WHERE databasename = 'someval'
AND source <> 'kt'
AND c.source IS NULL
)If EXISTS(
SELECT source
FROM somedb.dbo.bigtable l
WHERE databasename = 'someval'
AND source <> 'kt'
EXCEPT
SELECT source
FROM dbo.smalltable
)Context
StackExchange Database Administrators Q#157353, answer score: 22
Revisions (0)
No revisions yet.