HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

SQL Server creates different plans when OR condition is re-arranged

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
createsconditionsqlarrangeddifferentwhenserverplans

Problem

I was reviewing an under-performing query which looks like this:

WHERE manymany.Active = -1
  AND manymany.Check1 = -1
  AND manymany.WebsiteID = @P1
  AND CURRENT_TIMESTAMP BETWEEN ISNULL(manymany.FromDate, '1950-01-01') AND ISNULL(manymany.UptoDate, '2050-01-01')
  AND main.Active = -1
  AND main.StatusID = 1
  AND CURRENT_TIMESTAMP BETWEEN main.FromDate AND ISNULL(main.UptoDate, '2050-01-01')
  AND (main.TextCol1 IS NOT NULL OR main.TextCol2 IS NOT NULL)
ORDER BY aux.SortCode


I accidentally used SSMS query designer on this query and it re-wrote the query as follows:

WHERE manymany.Active = -1
  AND manymany.Check1 = -1
  AND manymany.WebsiteID = @P2
  AND CURRENT_TIMESTAMP BETWEEN ISNULL(manymany.FromDate, '1950-01-01') AND ISNULL(manymany.UptoDate, '2050-01-01')
  AND main.Active = -1
  AND main.StatusID = 1
  AND CURRENT_TIMESTAMP BETWEEN main.FromDate AND ISNULL(main.UptoDate, '2050-01-01')
  AND main.TextCol1 IS NOT NULL

   OR manymany.Active = -1
  AND manymany.Check1 = -1
  AND manymany.WebsiteID = @P2
  AND CURRENT_TIMESTAMP BETWEEN ISNULL(manymany.FromDate, '1950-01-01') AND ISNULL(manymany.UptoDate, '2050-01-01')
  AND main.Active = -1
  AND main.StatusID = 1
  AND CURRENT_TIMESTAMP BETWEEN main.FromDate AND ISNULL(main.UptoDate, '2050-01-01')
  AND main.TextCol2 IS NOT NULL
ORDER BY aux.SortCode


If you look closely you'll notice that it simply expanded the OR condition by repeating all conditions i.e. it changed a AND (b OR c) to (a AND b) OR (a AND c).

The resulting query was 50% smaller in terms of cost and 33% smaller in terms of execution time. I simply do not understand why re-arranging the OR condition changed the plan when both queries are identical (?). I could have expanded the OR condition myself by copy-pasting the conditions but why should I?

Paste the plan and screenshot:

Number of rows:

main     2718
manymany 188761
aux      19


Notes:

  • TextCol1 and TextCol2 are text datatype and cannot be ind

Solution

But why doesn't SQL server sees both queries as one? After all, a AND
(b OR c) = (a AND b) OR (a AND c)?

Logically it is the same, and it will get the same results.

Assumptions

My assumptions are that for the 'faster' plan, the optimizer is not considering some filter statements at the top of the OR to be the same as some filter statements at the bottom. I might be entirely off base here.

the reasoning for getting these assumptions are based on this filter predicate:

This filter predicate uses the result of the join between the Main table and manymany table.

Notice that EXPR1021 and EXPR1022 in this filter are expressions created from the scalar operator on the manymany table.

This filter consists of two parts, the first one with (.. AND .. OR .. AND ..)
and the second one plain AND filtering

(getdate()>=[Expr1021] 
AND getdate()=[DB1].[dbo].[main].[FromDate] 
AND getdate()=[Expr1021] 
AND getdate()=[DB1].[dbo].[main].[FromDate] 
 AND getdate()=[DB1].[dbo].[main].[FromDate] 
 AND getdate()=[DB1].[dbo].[main].[FromDate] 
 AND getdate()<=isnull([DB1].[dbo].[main].[UptoDate],'2050-01-01 00:00:00.000') 
 AND [DB1].[dbo].[main].[TextCol2] IS NOT NULL)


As you can see, the only difference above and below the OR in the first part of this filter is

AND [DB1].[dbo].[main].[TextCol1] IS NOT NULL


VS

AND [DB1].[dbo].[main].[TextCol2] IS NOT NULL


And the second part needs to be true no matter what, as they are AND predicates without any OR's.

Resulting in extra calculations of the same functions, that in my opinion are not needed.
Again, my guess here is that the reason that sql server does these calculations is that it does not know that they are the same.

For some other parts of the where clause, it does know that these are the same, e.g. In the main table, the statusid = 1 is only evaluated once:

And in the manymany table, the same statement is evaluated twice:

In the 'slow' plan, the statements are not added together with OR clauses, and that is why the optimizer is generating a different plan, applying filter predicates on the tables separately (and no duplicate filters).

End of assumptions

Comparison of the two plans

I think that you got lucky with the performance of the 'fast' plan, but that the 'fast' plan might turn ugly when the matching data increases. It can depend on where and when you are applying your filters (and other factors).

The fast plan filtering

In the 'fast' plan: sql server applies some of the filters after the join of
the main table with the manymany table as a result of different combinations with the two OR's + (AND ... AND ... AND...) blocks.
The columns from the maintable are filtered after finding all the possible combinations with the manymany table.

As a result, the same predicate is executed twice on the manymany table:

For the predicates above and below the OR.

But this is not the case for some of the seek predicates on the main table

After this the join happens, and an even bigger filter predicate on the results of the join between main and manymany happens, again for all the possible combinations

Notice that EXPR1021 and EXPR1022 in this filter are expressions created from the scalar operator on the manymany table.

This filter consists of two parts, the first one with (.. AND .. OR .. AND ..)
and the second one plain AND filtering

(getdate()>=[Expr1021] 
AND getdate()=[DB1].[dbo].[main].[FromDate] 
AND getdate()=[Expr1021] 
AND getdate()=[DB1].[dbo].[main].[FromDate] 
 AND getdate()=[DB1].[dbo].[main].[FromDate] 
 AND getdate()=[DB1].[dbo].[main].[FromDate] 
 AND getdate()<=isnull([DB1].[dbo].[main].[UptoDate],'2050-01-01 00:00:00.000') 
 AND [DB1].[dbo].[main].[TextCol2] IS NOT NULL)


As you can see, the only difference above and below the OR in the first part of this filter is

AND [DB1].[dbo].[main].[TextCol1] IS NOT NULL


VS

AND [DB1].[dbo].[main].[TextCol2] IS NOT NULL


And the second part needs to be true no matter what, as they are AND predicates without any OR's.

Resulting in extra calculations that in my opinion are not needed.

The slow plan filtering

In the 'slow' plan: sql server applies the filter directly to the
Main table as a result of the AND (TextCol1 IS NOT NULL OR TextCol2 IS NOT NULL) part and then joins with the manymany table to filter out the rest to get to 15 rows.

Main table filters

manymany table filters

Some other, sometimes overlapping information:

The slower plan

When we look at the slower plan, the clustered index PK_main is used, into a compute scalar, filter and nested loops operator:

When we compare this with the estimated rows to be returned, we see a difference:

It is estimating 93 rows to be returned by the predicate on the scan:

Which is actually about 20x less than what was expected, which is 1947 rows.

Afterwards, the Compute scalar or this statement:

```
, CASE WHEN TextC

Code Snippets

(getdate()>=[Expr1021] 
AND getdate()<=[Expr1022] 
AND getdate()>=[DB1].[dbo].[main].[FromDate] 
AND getdate()<=isnull([DB1].[dbo].[main].[UptoDate],'2050-01-01 00:00:00.000') 
AND [DB1].[dbo].[main].[TextCol1] IS NOT NULL 
OR getdate()>=[Expr1021] 
AND getdate()<=[Expr1022]
 AND getdate()>=[DB1].[dbo].[main].[FromDate] 
 AND getdate()<=isnull([DB1].[dbo].[main].[UptoDate],'2050-01-01 00:00:00.000') 
 AND [DB1].[dbo].[main].[TextCol2] IS NOT NULL) 

 AND (getdate()>=[DB1].[dbo].[main].[FromDate] 
 AND getdate()<=isnull([DB1].[dbo].[main].[UptoDate],'2050-01-01 00:00:00.000') 
 AND [DB1].[dbo].[main].[TextCol1] IS NOT NULL OR getdate()>=[DB1].[dbo].[main].[FromDate] 
 AND getdate()<=isnull([DB1].[dbo].[main].[UptoDate],'2050-01-01 00:00:00.000') 
 AND [DB1].[dbo].[main].[TextCol2] IS NOT NULL)
AND [DB1].[dbo].[main].[TextCol1] IS NOT NULL
AND [DB1].[dbo].[main].[TextCol2] IS NOT NULL
(getdate()>=[Expr1021] 
AND getdate()<=[Expr1022] 
AND getdate()>=[DB1].[dbo].[main].[FromDate] 
AND getdate()<=isnull([DB1].[dbo].[main].[UptoDate],'2050-01-01 00:00:00.000') 
AND [DB1].[dbo].[main].[TextCol1] IS NOT NULL 
OR getdate()>=[Expr1021] 
AND getdate()<=[Expr1022]
 AND getdate()>=[DB1].[dbo].[main].[FromDate] 
 AND getdate()<=isnull([DB1].[dbo].[main].[UptoDate],'2050-01-01 00:00:00.000') 
 AND [DB1].[dbo].[main].[TextCol2] IS NOT NULL) 

 AND (getdate()>=[DB1].[dbo].[main].[FromDate] 
 AND getdate()<=isnull([DB1].[dbo].[main].[UptoDate],'2050-01-01 00:00:00.000') 
 AND [DB1].[dbo].[main].[TextCol1] IS NOT NULL OR getdate()>=[DB1].[dbo].[main].[FromDate] 
 AND getdate()<=isnull([DB1].[dbo].[main].[UptoDate],'2050-01-01 00:00:00.000') 
 AND [DB1].[dbo].[main].[TextCol2] IS NOT NULL)
AND [DB1].[dbo].[main].[TextCol1] IS NOT NULL

Context

StackExchange Database Administrators Q#231428, answer score: 2

Revisions (0)

No revisions yet.