patternsqlMinor
SQL Server creates different plans when OR condition is re-arranged
Viewed 0 times
createsconditionsqlarrangeddifferentwhenserverplans
Problem
I was reviewing an under-performing query which looks like this:
I accidentally used SSMS query designer on this query and it re-wrote the query as follows:
If you look closely you'll notice that it simply expanded the
The resulting query was 50% smaller in terms of cost and 33% smaller in terms of execution time. I simply do not understand why re-arranging the
Paste the plan and screenshot:
Number of rows:
Notes:
WHERE manymany.Active = -1
AND manymany.Check1 = -1
AND manymany.WebsiteID = @P1
AND CURRENT_TIMESTAMP BETWEEN ISNULL(manymany.FromDate, '1950-01-01') AND ISNULL(manymany.UptoDate, '2050-01-01')
AND main.Active = -1
AND main.StatusID = 1
AND CURRENT_TIMESTAMP BETWEEN main.FromDate AND ISNULL(main.UptoDate, '2050-01-01')
AND (main.TextCol1 IS NOT NULL OR main.TextCol2 IS NOT NULL)
ORDER BY aux.SortCodeI accidentally used SSMS query designer on this query and it re-wrote the query as follows:
WHERE manymany.Active = -1
AND manymany.Check1 = -1
AND manymany.WebsiteID = @P2
AND CURRENT_TIMESTAMP BETWEEN ISNULL(manymany.FromDate, '1950-01-01') AND ISNULL(manymany.UptoDate, '2050-01-01')
AND main.Active = -1
AND main.StatusID = 1
AND CURRENT_TIMESTAMP BETWEEN main.FromDate AND ISNULL(main.UptoDate, '2050-01-01')
AND main.TextCol1 IS NOT NULL
OR manymany.Active = -1
AND manymany.Check1 = -1
AND manymany.WebsiteID = @P2
AND CURRENT_TIMESTAMP BETWEEN ISNULL(manymany.FromDate, '1950-01-01') AND ISNULL(manymany.UptoDate, '2050-01-01')
AND main.Active = -1
AND main.StatusID = 1
AND CURRENT_TIMESTAMP BETWEEN main.FromDate AND ISNULL(main.UptoDate, '2050-01-01')
AND main.TextCol2 IS NOT NULL
ORDER BY aux.SortCodeIf you look closely you'll notice that it simply expanded the
OR condition by repeating all conditions i.e. it changed a AND (b OR c) to (a AND b) OR (a AND c).The resulting query was 50% smaller in terms of cost and 33% smaller in terms of execution time. I simply do not understand why re-arranging the
OR condition changed the plan when both queries are identical (?). I could have expanded the OR condition myself by copy-pasting the conditions but why should I?Paste the plan and screenshot:
Number of rows:
main 2718
manymany 188761
aux 19Notes:
- TextCol1 and TextCol2 are
textdatatype and cannot be ind
Solution
But why doesn't SQL server sees both queries as one? After all, a AND
(b OR c) = (a AND b) OR (a AND c)?
Logically it is the same, and it will get the same results.
Assumptions
My assumptions are that for the 'faster' plan, the optimizer is not considering some filter statements at the top of the
the reasoning for getting these assumptions are based on this filter predicate:
This filter predicate uses the result of the join between the
Notice that EXPR1021 and EXPR1022 in this filter are expressions created from the scalar operator on the
This filter consists of two parts, the first one with
and the second one plain
As you can see, the only difference above and below the
VS
And the second part needs to be true no matter what, as they are
Resulting in extra calculations of the same functions, that in my opinion are not needed.
Again, my guess here is that the reason that sql server does these calculations is that it does not know that they are the same.
For some other parts of the where clause, it does know that these are the same, e.g. In the main table, the statusid = 1 is only evaluated once:
And in the
In the 'slow' plan, the statements are not added together with
End of assumptions
Comparison of the two plans
I think that you got lucky with the performance of the 'fast' plan, but that the 'fast' plan might turn ugly when the matching data increases. It can depend on where and when you are applying your filters (and other factors).
The fast plan filtering
In the 'fast' plan: sql server applies some of the filters after the join of
the
The columns from the
As a result, the same predicate is executed twice on the
For the predicates above and below the
But this is not the case for some of the seek predicates on the
After this the join happens, and an even bigger filter predicate on the results of the join between
Notice that EXPR1021 and EXPR1022 in this filter are expressions created from the scalar operator on the
This filter consists of two parts, the first one with
and the second one plain
As you can see, the only difference above and below the
VS
And the second part needs to be true no matter what, as they are
Resulting in extra calculations that in my opinion are not needed.
The slow plan filtering
In the 'slow' plan: sql server applies the filter directly to the
Main table as a result of the
Some other, sometimes overlapping information:
The slower plan
When we look at the slower plan, the clustered index PK_main is used, into a compute scalar, filter and nested loops operator:
When we compare this with the estimated rows to be returned, we see a difference:
It is estimating 93 rows to be returned by the predicate on the scan:
Which is actually about 20x less than what was expected, which is 1947 rows.
Afterwards, the Compute scalar or this statement:
```
, CASE WHEN TextC
(b OR c) = (a AND b) OR (a AND c)?
Logically it is the same, and it will get the same results.
Assumptions
My assumptions are that for the 'faster' plan, the optimizer is not considering some filter statements at the top of the
OR to be the same as some filter statements at the bottom. I might be entirely off base here.the reasoning for getting these assumptions are based on this filter predicate:
This filter predicate uses the result of the join between the
Main table and manymany table. Notice that EXPR1021 and EXPR1022 in this filter are expressions created from the scalar operator on the
manymany table.This filter consists of two parts, the first one with
(.. AND .. OR .. AND ..)and the second one plain
AND filtering(getdate()>=[Expr1021]
AND getdate()=[DB1].[dbo].[main].[FromDate]
AND getdate()=[Expr1021]
AND getdate()=[DB1].[dbo].[main].[FromDate]
AND getdate()=[DB1].[dbo].[main].[FromDate]
AND getdate()=[DB1].[dbo].[main].[FromDate]
AND getdate()<=isnull([DB1].[dbo].[main].[UptoDate],'2050-01-01 00:00:00.000')
AND [DB1].[dbo].[main].[TextCol2] IS NOT NULL)As you can see, the only difference above and below the
OR in the first part of this filter is AND [DB1].[dbo].[main].[TextCol1] IS NOT NULLVS
AND [DB1].[dbo].[main].[TextCol2] IS NOT NULLAnd the second part needs to be true no matter what, as they are
AND predicates without any OR's. Resulting in extra calculations of the same functions, that in my opinion are not needed.
Again, my guess here is that the reason that sql server does these calculations is that it does not know that they are the same.
For some other parts of the where clause, it does know that these are the same, e.g. In the main table, the statusid = 1 is only evaluated once:
And in the
manymany table, the same statement is evaluated twice:In the 'slow' plan, the statements are not added together with
OR clauses, and that is why the optimizer is generating a different plan, applying filter predicates on the tables separately (and no duplicate filters).End of assumptions
Comparison of the two plans
I think that you got lucky with the performance of the 'fast' plan, but that the 'fast' plan might turn ugly when the matching data increases. It can depend on where and when you are applying your filters (and other factors).
The fast plan filtering
In the 'fast' plan: sql server applies some of the filters after the join of
the
main table with the manymany table as a result of different combinations with the two OR's + (AND ... AND ... AND...) blocks. The columns from the
maintable are filtered after finding all the possible combinations with the manymany table.As a result, the same predicate is executed twice on the
manymany table:For the predicates above and below the
OR.But this is not the case for some of the seek predicates on the
main tableAfter this the join happens, and an even bigger filter predicate on the results of the join between
main and manymany happens, again for all the possible combinationsNotice that EXPR1021 and EXPR1022 in this filter are expressions created from the scalar operator on the
manymany table.This filter consists of two parts, the first one with
(.. AND .. OR .. AND ..)and the second one plain
AND filtering(getdate()>=[Expr1021]
AND getdate()=[DB1].[dbo].[main].[FromDate]
AND getdate()=[Expr1021]
AND getdate()=[DB1].[dbo].[main].[FromDate]
AND getdate()=[DB1].[dbo].[main].[FromDate]
AND getdate()=[DB1].[dbo].[main].[FromDate]
AND getdate()<=isnull([DB1].[dbo].[main].[UptoDate],'2050-01-01 00:00:00.000')
AND [DB1].[dbo].[main].[TextCol2] IS NOT NULL)As you can see, the only difference above and below the
OR in the first part of this filter is AND [DB1].[dbo].[main].[TextCol1] IS NOT NULLVS
AND [DB1].[dbo].[main].[TextCol2] IS NOT NULLAnd the second part needs to be true no matter what, as they are
AND predicates without any OR's. Resulting in extra calculations that in my opinion are not needed.
The slow plan filtering
In the 'slow' plan: sql server applies the filter directly to the
Main table as a result of the
AND (TextCol1 IS NOT NULL OR TextCol2 IS NOT NULL) part and then joins with the manymany table to filter out the rest to get to 15 rows. Main table filtersmanymany table filtersSome other, sometimes overlapping information:
The slower plan
When we look at the slower plan, the clustered index PK_main is used, into a compute scalar, filter and nested loops operator:
When we compare this with the estimated rows to be returned, we see a difference:
It is estimating 93 rows to be returned by the predicate on the scan:
Which is actually about 20x less than what was expected, which is 1947 rows.
Afterwards, the Compute scalar or this statement:
```
, CASE WHEN TextC
Code Snippets
(getdate()>=[Expr1021]
AND getdate()<=[Expr1022]
AND getdate()>=[DB1].[dbo].[main].[FromDate]
AND getdate()<=isnull([DB1].[dbo].[main].[UptoDate],'2050-01-01 00:00:00.000')
AND [DB1].[dbo].[main].[TextCol1] IS NOT NULL
OR getdate()>=[Expr1021]
AND getdate()<=[Expr1022]
AND getdate()>=[DB1].[dbo].[main].[FromDate]
AND getdate()<=isnull([DB1].[dbo].[main].[UptoDate],'2050-01-01 00:00:00.000')
AND [DB1].[dbo].[main].[TextCol2] IS NOT NULL)
AND (getdate()>=[DB1].[dbo].[main].[FromDate]
AND getdate()<=isnull([DB1].[dbo].[main].[UptoDate],'2050-01-01 00:00:00.000')
AND [DB1].[dbo].[main].[TextCol1] IS NOT NULL OR getdate()>=[DB1].[dbo].[main].[FromDate]
AND getdate()<=isnull([DB1].[dbo].[main].[UptoDate],'2050-01-01 00:00:00.000')
AND [DB1].[dbo].[main].[TextCol2] IS NOT NULL)AND [DB1].[dbo].[main].[TextCol1] IS NOT NULLAND [DB1].[dbo].[main].[TextCol2] IS NOT NULL(getdate()>=[Expr1021]
AND getdate()<=[Expr1022]
AND getdate()>=[DB1].[dbo].[main].[FromDate]
AND getdate()<=isnull([DB1].[dbo].[main].[UptoDate],'2050-01-01 00:00:00.000')
AND [DB1].[dbo].[main].[TextCol1] IS NOT NULL
OR getdate()>=[Expr1021]
AND getdate()<=[Expr1022]
AND getdate()>=[DB1].[dbo].[main].[FromDate]
AND getdate()<=isnull([DB1].[dbo].[main].[UptoDate],'2050-01-01 00:00:00.000')
AND [DB1].[dbo].[main].[TextCol2] IS NOT NULL)
AND (getdate()>=[DB1].[dbo].[main].[FromDate]
AND getdate()<=isnull([DB1].[dbo].[main].[UptoDate],'2050-01-01 00:00:00.000')
AND [DB1].[dbo].[main].[TextCol1] IS NOT NULL OR getdate()>=[DB1].[dbo].[main].[FromDate]
AND getdate()<=isnull([DB1].[dbo].[main].[UptoDate],'2050-01-01 00:00:00.000')
AND [DB1].[dbo].[main].[TextCol2] IS NOT NULL)AND [DB1].[dbo].[main].[TextCol1] IS NOT NULLContext
StackExchange Database Administrators Q#231428, answer score: 2
Revisions (0)
No revisions yet.