gotchasqlModerate
Why does nested loops join only support left joins?
Viewed 0 times
leftwhyloopsjoinnesteddoesjoinsonlysupport
Problem
In Craig Freedman's blog, Nested Loops Join, he explains why the nested loops join cannot support a right outer join:
The problem is that we scan the inner table multiple times – once for
each row of the outer join. We may encounter the same inner rows
multiple times during these multiple scans.
At what point can we conclude that a particular inner row has not or will not join?
Can someone please explain this in a really simple and educational way?
Does it mean that the loop starts with the outer table (
I understand that for a
SQL Server does in fact optimize (and often replaces)
The problem is that we scan the inner table multiple times – once for
each row of the outer join. We may encounter the same inner rows
multiple times during these multiple scans.
At what point can we conclude that a particular inner row has not or will not join?
Can someone please explain this in a really simple and educational way?
Does it mean that the loop starts with the outer table (
R1) and the scans the inner (R2)?I understand that for a
R1 value that doesn't join with R2, it should be replaced with a NULL so the result set becomes (NULL, R2). For me it seems impossible to return an R2 value when R1 does not join, for the reason that it cannot know which R2 value to return. But that's not the way it's explained. Or is it?SQL Server does in fact optimize (and often replaces)
RIGHT JOIN with LEFT JOIN, but the question is to explain why it's technically impossible for a NESTED LOOPS JOIN to use/support RIGHT JOIN logic.Solution
What I don't like in the linked article is the statement that "nested loop join algorithm does not support the logical join operator of right join".
While there is a limitation, the wording at this point is a bit confusing. I hope the following explains things better:
The nested lop join algorithm involves two tables (whether base tables or result sets of previous operations is irrelevant) which are named outer and inner table and they are treated in a different way by the algorithm (the "outer" table is traversed at the outer loop and the "inner" table at the inner loops).
So, lets say we have have a join:
The algorithm can be executed as either:
or:
Now, if (
But when
and not
And since a
and not the other way around*.
The same limitation applies for left-semijoin, left-anti-semijoin, right-semijoin and right-anti-semijoin.
The
* As Dudu Markovitz explains in his answer, the reverse way would be able to be used but only if we modified the nested-loop join algorithm to have an extra structure and an extra step in the end.
While there is a limitation, the wording at this point is a bit confusing. I hope the following explains things better:
The nested lop join algorithm involves two tables (whether base tables or result sets of previous operations is irrelevant) which are named outer and inner table and they are treated in a different way by the algorithm (the "outer" table is traversed at the outer loop and the "inner" table at the inner loops).
So, lets say we have have a join:
A (some_type) JOIN BThe algorithm can be executed as either:
outer-loop-A nested-loop inner-loop-Bor:
outer-loop-B nested-loop inner-loop-ANow, if (
some_type) is INNER or CROSS join, then there is no limitation, the planner can choose between either of the two ways (with different performance characteristics, depending on the size of the sets, distribution of values of the joined columns, indexes, etc. Usually the smallest table will be chosen to be the "outer" table in the algorithm).But when
some_type is LEFT join, it can only use:outer-loop-A nested-loop inner-loop-Band not
outer-loop-B nested-loop inner-loop-AAnd since a
RIGHT can always be rewritten as a LEFT join, it has the same limitation, in reverse. For A RIGHT JOIN B (which can be rewritten a B LEFT JOIN A) it can only use: outer-loop-B nested-loop inner-loop-Aand not the other way around*.
The same limitation applies for left-semijoin, left-anti-semijoin, right-semijoin and right-anti-semijoin.
The
FULL join on the other hand cannot be directly handled with a nested loop join algorithm. The article explains very well (it's near the end) how a full join can be rewritten (and is by the optimizer) to a union of a left join and a left anti-semijoin which then might be planned as two nested loops (and a union).* As Dudu Markovitz explains in his answer, the reverse way would be able to be used but only if we modified the nested-loop join algorithm to have an extra structure and an extra step in the end.
Code Snippets
A (some_type) JOIN Bouter-loop-A nested-loop inner-loop-Bouter-loop-B nested-loop inner-loop-Aouter-loop-A nested-loop inner-loop-Bouter-loop-B nested-loop inner-loop-AContext
StackExchange Database Administrators Q#161849, answer score: 14
Revisions (0)
No revisions yet.