patternsqlMinor
Joining two subqueries with union all and join leads to suboptimal execution plan
Viewed 0 times
allwithsuboptimalunionjoinexecutiontwoplanandleads
Problem
I've faced the similar problem on my production environment, but I managed to reproduce this behavior on Northwind DB.
Consider the following query:
It has pretty good query plan:
Now when I uncomment line with
When I add
On the other hand, when I remove one of the
Is that expected behavior? Why does the query processor not put filter by
Consider the following query:
USE NORTHWND;
DECLARE @ids TABLE ( Id NCHAR(50) );
INSERT INTO @ids
VALUES ( N'AROUT' ),
( N'ALFKI' );
SELECT *
FROM ( SELECT c.CustomerID
FROM dbo.Customers c
WHERE c.CustomerID IN ( SELECT Id
FROM @ids )
UNION ALL
SELECT '0'
) t1
JOIN ( SELECT o.CustomerID
FROM dbo.Orders o
--LEFT JOIN dbo.[Order Details] od ON o.OrderID = od.OrderID
UNION ALL
SELECT '0'
) t2 ON t2.CustomerID = t1.CustomerID
OPTION ( RECOMPILE );It has pretty good query plan:
Now when I uncomment line with
LEFT JOIN, the plan becomes not as good (seems like query processor can't push the CustomerID filter to the data access operators):When I add
FORCESEEK hint on dbo.Orders, query processor can't produce a query plan. On the other hand, when I remove one of the
UNION ALL, the query plan becomes as good as the first one.Is that expected behavior? Why does the query processor not put filter by
CustomerID before the join operator?Solution
I think that SQL Server simply does not have the appropriate optimization rules to yield the seek-into-
In order to reach this conclusion, I compared the full original query (including the
The original query plan
Here is the original plan, with an estimated cost of
The alternate query plan
This query is semantically equivalent to the original query but has an estimated cost of
The alternate query
Here is the full query you can use to play around with this approach:
Takeaways / caveats
Orders query that you are looking for in the case of a UNION ALL on both sides of the query. Such a query plan is theoretically possible, but the query optimizer is not capable of producing it for your query.In order to reach this conclusion, I compared the full original query (including the
LEFT JOIN) to an alternate formulation of the query that yields a much lower estimated cost (0.042 vs. 0.085). So SQL Server would be likely to choose this lower cost alternative if it were able to explore this plan shape for your query.The original query plan
Here is the original plan, with an estimated cost of
0.085.The alternate query plan
This query is semantically equivalent to the original query but has an estimated cost of
0.042, about half the cost. It is formed by promoting the second UNION ALL up to the top-level of the query. This requires us to reference t1 (the set of customers) twice, but even so yields a plan with half the cost by allowing seeks into the Orders and [Order Details] tables.The alternate query
Here is the full query you can use to play around with this approach:
DECLARE @ids TABLE ( Id NCHAR(50) );
INSERT INTO @ids
VALUES ( N'AROUT' ),
( N'ALFKI' );
-- Define the original t1 as a CTE
WITH customers AS (
SELECT c.CustomerID
FROM dbo.Customers c
WHERE c.CustomerID IN ( SELECT Id
FROM @ids )
UNION ALL
SELECT '0'
)
-- Join t1 to the top half of the original UNION ALL
SELECT *
FROM Customers t1
JOIN ( SELECT o.CustomerID
FROM dbo.Orders o
LEFT JOIN dbo.[Order Details] od ON o.OrderID = od.OrderID
) t2 ON t2.CustomerID = t1.CustomerID
-- And then join it again to the bottom half of the original UNION ALL
UNION ALL
SELECT *
FROM Customers t1
JOIN ( SELECT '0' AS CustomerId ) t2 ON t2.CustomerID = t1.CustomerID
OPTION ( RECOMPILE);Takeaways / caveats
- In general, I have found that
Concatenationoperators are good at getting in the way of the query optimizer and preventing it from yielding an optimal plan. A couple examples include this Connect issue whereUNION ALLprevents optimized bitmap filters as well as repeated observations thatConcatenationoperators, particularly if nested, fool the SQL 2012 and earlier Cardinality Estimator and may yield sub-optimal plans due to poor cardinality estimates. ObviouslyUNION ALLis a very useful tool, but it's worth being aware that it does occasionally limit the query optimizer's ability to optimize your query.
- Actually splitting the query into two separate units in this way might or might not make sense on your real use case, but it could be worth a try. (This particular query benefits from the fact that one of the two chunks can be simplified since it contains only the dummy
SELECT '0'.)
- The data sets involved on
Northwindare so small that it's difficult to draw conclusions, so testing with your real world data set will be important.
Code Snippets
DECLARE @ids TABLE ( Id NCHAR(50) );
INSERT INTO @ids
VALUES ( N'AROUT' ),
( N'ALFKI' );
-- Define the original t1 as a CTE
WITH customers AS (
SELECT c.CustomerID
FROM dbo.Customers c
WHERE c.CustomerID IN ( SELECT Id
FROM @ids )
UNION ALL
SELECT '0'
)
-- Join t1 to the top half of the original UNION ALL
SELECT *
FROM Customers t1
JOIN ( SELECT o.CustomerID
FROM dbo.Orders o
LEFT JOIN dbo.[Order Details] od ON o.OrderID = od.OrderID
) t2 ON t2.CustomerID = t1.CustomerID
-- And then join it again to the bottom half of the original UNION ALL
UNION ALL
SELECT *
FROM Customers t1
JOIN ( SELECT '0' AS CustomerId ) t2 ON t2.CustomerID = t1.CustomerID
OPTION ( RECOMPILE);Context
StackExchange Database Administrators Q#116957, answer score: 4
Revisions (0)
No revisions yet.