HiveBrain v1.2.0
Get Started
← Back to all entries
gotchasqlModerate

Why does nested loops join only support left joins?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
leftwhyloopsjoinnesteddoesjoinsonlysupport

Problem

In Craig Freedman's blog, Nested Loops Join, he explains why the nested loops join cannot support a right outer join:


The problem is that we scan the inner table multiple times – once for
each row of the outer join. We may encounter the same inner rows
multiple times during these multiple scans.
At what point can we conclude that a particular inner row has not or will not join?

Can someone please explain this in a really simple and educational way?

Does it mean that the loop starts with the outer table (R1) and the scans the inner (R2)?

I understand that for a R1 value that doesn't join with R2, it should be replaced with a NULL so the result set becomes (NULL, R2). For me it seems impossible to return an R2 value when R1 does not join, for the reason that it cannot know which R2 value to return. But that's not the way it's explained. Or is it?

SQL Server does in fact optimize (and often replaces) RIGHT JOIN with LEFT JOIN, but the question is to explain why it's technically impossible for a NESTED LOOPS JOIN to use/support RIGHT JOIN logic.

Solution

What I don't like in the linked article is the statement that "nested loop join algorithm does not support the logical join operator of right join".

While there is a limitation, the wording at this point is a bit confusing. I hope the following explains things better:

The nested lop join algorithm involves two tables (whether base tables or result sets of previous operations is irrelevant) which are named outer and inner table and they are treated in a different way by the algorithm (the "outer" table is traversed at the outer loop and the "inner" table at the inner loops).

So, lets say we have have a join:

A (some_type) JOIN B


The algorithm can be executed as either:

outer-loop-A  nested-loop  inner-loop-B


or:

outer-loop-B  nested-loop  inner-loop-A


Now, if (some_type) is INNER or CROSS join, then there is no limitation, the planner can choose between either of the two ways (with different performance characteristics, depending on the size of the sets, distribution of values of the joined columns, indexes, etc. Usually the smallest table will be chosen to be the "outer" table in the algorithm).

But when some_type is LEFT join, it can only use:

outer-loop-A  nested-loop  inner-loop-B


and not

outer-loop-B  nested-loop  inner-loop-A


And since a RIGHT can always be rewritten as a LEFT join, it has the same limitation, in reverse. For A RIGHT JOIN B (which can be rewritten a B LEFT JOIN A) it can only use:

outer-loop-B  nested-loop  inner-loop-A


and not the other way around*.

The same limitation applies for left-semijoin, left-anti-semijoin, right-semijoin and right-anti-semijoin.

The FULL join on the other hand cannot be directly handled with a nested loop join algorithm. The article explains very well (it's near the end) how a full join can be rewritten (and is by the optimizer) to a union of a left join and a left anti-semijoin which then might be planned as two nested loops (and a union).

* As Dudu Markovitz explains in his answer, the reverse way would be able to be used but only if we modified the nested-loop join algorithm to have an extra structure and an extra step in the end.

Code Snippets

A (some_type) JOIN B
outer-loop-A  nested-loop  inner-loop-B
outer-loop-B  nested-loop  inner-loop-A
outer-loop-A  nested-loop  inner-loop-B
outer-loop-B  nested-loop  inner-loop-A

Context

StackExchange Database Administrators Q#161849, answer score: 14

Revisions (0)

No revisions yet.