HiveBrain v1.2.0
Get Started
← Back to all entries
debugsqlModerate

Why won't SQL Server optimize the UNIONs?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
whytheserversqloptimizeunionswon

Problem

Consider these queries (SQL Fiddle):

Query 1:

SELECT * INTO #TMP1 FROM Foo
UNION
SELECT * FROM Boo
UNION
SELECT * FROM Koo;


Query 2:

SELECT * INTO #TMP2 FROM Foo
UNION
SELECT * FROM Boo
UNION ALL
SELECT * FROM Koo;


Note that Koo does not overlap with Boo/Foo, so the end result is the same. The question is why the first UNION / UNION combination is not merged into a single SORT operation?

Solution

The query optimizer does have n-ary operators, though the execution engine has rather fewer. To illustrate, I'm going to use a simplified version of your tables - (SQL Fiddle).

SELECT DISTINCT
    number
INTO foo
FROM master..spt_values
WHERE 
    number  1006;

ALTER TABLE dbo.foo ADD PRIMARY KEY (number);
ALTER TABLE dbo.boo ADD PRIMARY KEY (number);
ALTER TABLE dbo.koo ADD PRIMARY KEY (number);


Given those tables and data, let's look at the input tree for a three-way UNION query:

SELECT f.number FROM dbo.foo AS f
UNION
SELECT b.number FROM dbo.boo AS b
UNION
SELECT k.number FROM dbo.koo AS k
OPTION (QUERYTRACEON 3604, QUERYTRACEON 8605);

LogOp_Union
    OUTPUT(COL: Union1006 )
    CHILD(QCOL: [f].number)
    CHILD(QCOL: [b].number)
    CHILD(QCOL: [k].number)
    LogOp_Project
        LogOp_Get TBL: dbo.foo(alias TBL: f)
        AncOp_PrjList 
    LogOp_Project
        LogOp_Get TBL: dbo.boo(alias TBL: b)
        AncOp_PrjList 
    LogOp_Project
        LogOp_Get TBL: dbo.koo(alias TBL: k)
        AncOp_PrjList


The logical union operator has one output and three child inputs. After cost-based optimization, the physical tree chosen is a merge union with three inputs:

SELECT f.number FROM dbo.foo AS f
UNION
SELECT b.number FROM dbo.boo AS b
UNION
SELECT k.number FROM dbo.koo AS k
OPTION (QUERYTRACEON 3604, QUERYTRACEON 8607);

PhyOp_MergeUnion
    PhyOp_Range TBL: dbo.foo(alias TBL: f)(1) ASC
    PhyOp_Range TBL: dbo.boo(alias TBL: b)(1) ASC
    PhyOp_Range TBL: dbo.koo(alias TBL: k)(1) ASC


The optimizer's output is reworked into a form that the execution engine (without n-ary merge union) can handle:

The post-optimization rewrite unfolds the n-ary PhyOp_MergeUnion into multiple Merge Union operators. Notice how all the estimated cost remains associated with the 'original' union operator - the others have a zero cost estimate.

That the optimizer reasons about unions using n-ary operators provides one explanation for why it does not consider rewriting your first example to the same plan as the second example (the three-way union is a single tree node).

The second reason is there are no constraints to enforce the 'lack of overlap'. Before constraints are in place, a union between boo and koo cannot be guaranteed not to produce duplicates, so we get a duplicate-removing plan (a Merge Union in this case):

SELECT b.number FROM dbo.boo AS b
UNION
SELECT k.number FROM dbo.koo AS k;


Adding the following constraints ensures the non-overlap condition cannot be violated without invalidating cached plans for the query:

ALTER TABLE dbo.foo WITH CHECK ADD CHECK (number  1006);


Now it is safe for the optimizer to simply concatenate:

However, even with those constraints in place, the three-way union query still appears as three unions because the optimizer does not normally consider splitting n-ary operators to explore alternatives. The n-ary operator thing is very useful in keeping the search space under control; breaking it apart would often be counter-productive given the optimizer's goal of finding a good plan quickly.

SELECT f.number FROM dbo.foo AS f
UNION
SELECT b.number FROM dbo.boo AS b
UNION
SELECT k.number FROM dbo.koo AS k;


When written as a UNION and UNION ALL, an n-ary operator can no longer be used (the types do not match) so the tree has separate nodes:

SELECT f.number FROM dbo.foo AS f
UNION
SELECT b.number FROM dbo.boo AS b
UNION ALL
SELECT k.number FROM dbo.koo AS k
OPTION (QUERYTRACEON 3604, QUERYTRACEON 8605);

LogOp_UnionAll
    OUTPUT(COL: Union1007 )
    CHILD(COL: Union1004 )
    CHILD(QCOL: [k].number)

    LogOp_Union
        OUTPUT(COL: Union1004 )
        CHILD(QCOL: [f].number)
        CHILD(QCOL: [b].number)

        LogOp_Project
            LogOp_Get TBL: dbo.foo(alias TBL: f)
            AncOp_PrjList 

        LogOp_Project
            LogOp_Get TBL: dbo.boo(alias TBL: b)
            AncOp_PrjList 

    LogOp_Project
        LogOp_Get TBL: dbo.koo(alias TBL: k)
        AncOp_PrjList

Code Snippets

SELECT DISTINCT
    number
INTO foo
FROM master..spt_values
WHERE 
    number < 1000;

SELECT DISTINCT
    number
INTO boo
FROM master..spt_values
WHERE 
    number between 300 and 1005;

SELECT DISTINCT
    number
INTO koo
FROM master..spt_values
WHERE 
    number > 1006;

ALTER TABLE dbo.foo ADD PRIMARY KEY (number);
ALTER TABLE dbo.boo ADD PRIMARY KEY (number);
ALTER TABLE dbo.koo ADD PRIMARY KEY (number);
SELECT f.number FROM dbo.foo AS f
UNION
SELECT b.number FROM dbo.boo AS b
UNION
SELECT k.number FROM dbo.koo AS k
OPTION (QUERYTRACEON 3604, QUERYTRACEON 8605);

LogOp_Union
    OUTPUT(COL: Union1006 )
    CHILD(QCOL: [f].number)
    CHILD(QCOL: [b].number)
    CHILD(QCOL: [k].number)
    LogOp_Project
        LogOp_Get TBL: dbo.foo(alias TBL: f)
        AncOp_PrjList 
    LogOp_Project
        LogOp_Get TBL: dbo.boo(alias TBL: b)
        AncOp_PrjList 
    LogOp_Project
        LogOp_Get TBL: dbo.koo(alias TBL: k)
        AncOp_PrjList
SELECT f.number FROM dbo.foo AS f
UNION
SELECT b.number FROM dbo.boo AS b
UNION
SELECT k.number FROM dbo.koo AS k
OPTION (QUERYTRACEON 3604, QUERYTRACEON 8607);

PhyOp_MergeUnion
    PhyOp_Range TBL: dbo.foo(alias TBL: f)(1) ASC
    PhyOp_Range TBL: dbo.boo(alias TBL: b)(1) ASC
    PhyOp_Range TBL: dbo.koo(alias TBL: k)(1) ASC
SELECT b.number FROM dbo.boo AS b
UNION
SELECT k.number FROM dbo.koo AS k;
ALTER TABLE dbo.foo WITH CHECK ADD CHECK (number < 1000);
ALTER TABLE dbo.boo WITH CHECK ADD CHECK (number BETWEEN 300 AND 1005);
ALTER TABLE dbo.koo WITH CHECK ADD CHECK (number > 1006);

Context

StackExchange Database Administrators Q#30165, answer score: 19

Revisions (0)

No revisions yet.