HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Should I nest dependent outer joins in SQL Server?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
sqlnestshouldserverouterdependentjoins

Problem

I have heard mixed information on this and am hoping for a canonical or expert opinion.

If I have multiple LEFT OUTER JOINs , each dependent on the last, is it better to nest them?

For a contrived example, the JOIN to MyParent depends on the JOIN to MyChild:
http://sqlfiddle.com/#!3/31022/5

SELECT
    {columns}
FROM
    MyGrandChild AS gc
LEFT OUTER JOIN
    MyChild AS c
        ON c.[Id] = gc.[ParentId]
LEFT OUTER JOIN
    MyParent AS p
        ON p.[id] = c.[ParentId]


Compared to http://sqlfiddle.com/#!3/31022/7

SELECT
    {columns}
FROM
    MyGrandChild AS gc
LEFT OUTER JOIN
    (
    MyChild AS c            
    LEFT OUTER JOIN
        MyParent AS p
            ON p.[id] = c.[ParentId]
    )
    ON c.[Id] = gc.[ParentId]


As shown above these produce different query plans in SS2k8

Solution

This is absolutely not a canonical answer but I noticed that for the nested loops query plans shown in the SQL Fiddle it was possible to apply the plan from Query 2 to Query 1 with the use of the USE PLAN hint but attempting the reverse operation fails with


Query processor could not produce query plan because USE PLAN hint
contains plan that could not be verified to be legal for query. Remove
or replace USE PLAN hint. For best likelihood of successful plan
forcing, verify that the plan provided in the USE PLAN hint is one
generated automatically by SQL Server for the same query.

Disabling the optimizer transformation rule ReorderLOJN prevents the previously successful plan hint from succeeding too.

Experimenting with greater quantities of data shows that SQL Server is certainly capable of transforming (A LOJ B) LOJ C to A LOJ (B LOJ C) naturally as well but I didn't see any evidence that the reverse is true.

A very contrived case where the first query performs better than the second is

DROP TABLE  MyGrandChild , MyChild,  MyParent

CREATE TABLE MyParent
(Id int)

CREATE TABLE MyChild
(Id int PRIMARY KEY
,ParentId int,
Filler char(8000) NULL)

CREATE TABLE MyGrandChild
(Id int
,ParentId int)

INSERT INTO MyChild
                      (Id, ParentId)
SELECT TOP (100000) ROW_NUMBER() OVER (ORDER BY @@SPID),
                     ROW_NUMBER() OVER (ORDER BY @@SPID)    
FROM master..spt_values  v1, master..spt_values                  

INSERT INTO MyGrandChild
                      (Id, ParentId)
OUTPUT INSERTED.Id INTO MyParent
SELECT TOP (3000) Id, Id AS ParentId
FROM MyChild
ORDER BY Id

SET STATISTICS IO ON;
SET STATISTICS TIME ON;

SELECT gc.Id       AS gcId,
       gc.ParentId AS gcpId,
       c.Id        AS cId,
       c.ParentId  AS cpId,
       p.Id        AS pId
FROM   MyGrandChild AS gc
       LEFT OUTER JOIN MyChild AS c
         ON c.[Id] = gc.[ParentId]
       LEFT OUTER JOIN MyParent AS p
         ON p.[Id] = c.[ParentId]

SELECT gc.Id       AS gcId,
       gc.ParentId AS gcpId,
       c.Id        AS cId,
       c.ParentId  AS cpId,
       p.Id        AS pId
FROM   MyGrandChild AS gc
       LEFT OUTER JOIN( MyChild AS c
                        LEFT OUTER JOIN MyParent AS p
                          ON p.[Id] = c.[ParentId])
         ON c.[Id] = gc.[ParentId]


Which gives plans

For me Query 1 had an elapsed time of 108 ms vs 1,163 ms for Query 2.

Query 1

Table 'Worktable'. Scan count 0, logical reads 0 
Table 'MyChild'. Scan count 0, logical reads 9196
Table 'MyGrandChild'. Scan count 1, logical reads 7
Table 'MyParent'. Scan count 1, logical reads 5


Query 2

Table 'MyParent'. Scan count 1, logical reads 15000
Table 'MyChild'. Scan count 0, logical reads 9000 
Table 'MyGrandChild'. Scan count 1, logical reads 7


So it might be provisionally assumed that the first ("unnested") syntax is potentially beneficial as it allows more potential join orders to be considered but I haven't done exhaustive enough testing to have much confidence in this as a general rule.

It may well be entirely possible to come up with counter examples where Query 2 performs better. Try both and look at the execution plans.

Code Snippets

DROP TABLE  MyGrandChild , MyChild,  MyParent

CREATE TABLE MyParent
(Id int)

CREATE TABLE MyChild
(Id int PRIMARY KEY
,ParentId int,
Filler char(8000) NULL)

CREATE TABLE MyGrandChild
(Id int
,ParentId int)

INSERT INTO MyChild
                      (Id, ParentId)
SELECT TOP (100000) ROW_NUMBER() OVER (ORDER BY @@SPID),
                     ROW_NUMBER() OVER (ORDER BY @@SPID)    
FROM master..spt_values  v1, master..spt_values                  

INSERT INTO MyGrandChild
                      (Id, ParentId)
OUTPUT INSERTED.Id INTO MyParent
SELECT TOP (3000) Id, Id AS ParentId
FROM MyChild
ORDER BY Id

SET STATISTICS IO ON;
SET STATISTICS TIME ON;

SELECT gc.Id       AS gcId,
       gc.ParentId AS gcpId,
       c.Id        AS cId,
       c.ParentId  AS cpId,
       p.Id        AS pId
FROM   MyGrandChild AS gc
       LEFT OUTER JOIN MyChild AS c
         ON c.[Id] = gc.[ParentId]
       LEFT OUTER JOIN MyParent AS p
         ON p.[Id] = c.[ParentId]

SELECT gc.Id       AS gcId,
       gc.ParentId AS gcpId,
       c.Id        AS cId,
       c.ParentId  AS cpId,
       p.Id        AS pId
FROM   MyGrandChild AS gc
       LEFT OUTER JOIN( MyChild AS c
                        LEFT OUTER JOIN MyParent AS p
                          ON p.[Id] = c.[ParentId])
         ON c.[Id] = gc.[ParentId]
Table 'Worktable'. Scan count 0, logical reads 0 
Table 'MyChild'. Scan count 0, logical reads 9196
Table 'MyGrandChild'. Scan count 1, logical reads 7
Table 'MyParent'. Scan count 1, logical reads 5
Table 'MyParent'. Scan count 1, logical reads 15000
Table 'MyChild'. Scan count 0, logical reads 9000 
Table 'MyGrandChild'. Scan count 1, logical reads 7

Context

StackExchange Database Administrators Q#58602, answer score: 3

Revisions (0)

No revisions yet.