patternsqlModerate
USING construct in JOIN clause can introduce optimization barriers in certain cases?
Viewed 0 times
canintroducejoinbarriersoptimizationusingconstructcertainclausecases
Problem
It was brought to my attention that the
I mean this key word:
SELECT *
FROM a
JOIN b USING (a_id)
Just in more complex cases.
Context: this comment to this question.
I use this a lot and have never noticed anything so far. I would be very interested in a test case demonstrating the effect or any links to further information. My search efforts came up empty.
The perfect answer would be a test case to show
USING construct (instead of ON) in the FROM clause of SELECT queries might introduce optimization barriers in certain cases.I mean this key word:
SELECT *
FROM a
JOIN b USING (a_id)
Just in more complex cases.
Context: this comment to this question.
I use this a lot and have never noticed anything so far. I would be very interested in a test case demonstrating the effect or any links to further information. My search efforts came up empty.
The perfect answer would be a test case to show
USING (a_id) with inferior performance when compared to the alternative join clause ON a.a_id = b.a_id - if that can actually happen.Solution
Erwin: I would concur with the idea that USING causing rigid ordering could well create many edge cases where optimal plans would be ruled out. I recently helped someone out who had something like this in his query:
In his case the worst of these join blocks was causing a nested loop join through some 200k rows, about 20k times (do the math), and since keys couldn't be pushed to indexes, it was a sequential scan. This meant that the overall query took about 3 hours to run due to cascading plan changes. By distributing the left join, the keys could be pushed down and the query ran in a matter of seconds. Of course this isn't exactly equivalent which is why the planner can't treat them as equivalent and so it was left figuring out that plan as a hash join and then doing a nested loop in, which was painfully slow.
Any time you rigidly force the joins to go through in a certain order you introduce cases where key filter information may not be available yet in the execution of the plan, and so what might be possible to do later in a quick index scan/hash join might be have to be done much slower in a nested loop/sequential scan and so while the above fragment is not immediately equivalent, it shows the same problem.
LEFT JOIN (
a
JOIN b ON a.id = b.a_id
JOIN c ON b.c_id = c.id
) ON a.id = something.a_id
LEFT JOIN (
table1 t1
JOIN table2 t2 ON t1.some_field = t2.other_field
JOIN talbe3 t3 ON t2.yafield = t3.something_else
) ON ....
repeat a few more timesIn his case the worst of these join blocks was causing a nested loop join through some 200k rows, about 20k times (do the math), and since keys couldn't be pushed to indexes, it was a sequential scan. This meant that the overall query took about 3 hours to run due to cascading plan changes. By distributing the left join, the keys could be pushed down and the query ran in a matter of seconds. Of course this isn't exactly equivalent which is why the planner can't treat them as equivalent and so it was left figuring out that plan as a hash join and then doing a nested loop in, which was painfully slow.
Any time you rigidly force the joins to go through in a certain order you introduce cases where key filter information may not be available yet in the execution of the plan, and so what might be possible to do later in a quick index scan/hash join might be have to be done much slower in a nested loop/sequential scan and so while the above fragment is not immediately equivalent, it shows the same problem.
Code Snippets
LEFT JOIN (
a
JOIN b ON a.id = b.a_id
JOIN c ON b.c_id = c.id
) ON a.id = something.a_id
LEFT JOIN (
table1 t1
JOIN table2 t2 ON t1.some_field = t2.other_field
JOIN talbe3 t3 ON t2.yafield = t3.something_else
) ON ....
repeat a few more timesContext
StackExchange Database Administrators Q#16409, answer score: 12
Revisions (0)
No revisions yet.