HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

USING construct in JOIN clause can introduce optimization barriers in certain cases?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
canintroducejoinbarriersoptimizationusingconstructcertainclausecases

Problem

It was brought to my attention that the USING construct (instead of ON) in the FROM clause of SELECT queries might introduce optimization barriers in certain cases.

I mean this key word:

SELECT *
FROM a
JOIN b USING (a_id)

Just in more complex cases.

Context: this comment to this question.

I use this a lot and have never noticed anything so far. I would be very interested in a test case demonstrating the effect or any links to further information. My search efforts came up empty.

The perfect answer would be a test case to show USING (a_id) with inferior performance when compared to the alternative join clause ON a.a_id = b.a_id - if that can actually happen.

Solution

Erwin: I would concur with the idea that USING causing rigid ordering could well create many edge cases where optimal plans would be ruled out. I recently helped someone out who had something like this in his query:

LEFT JOIN ( 
     a 
     JOIN b ON a.id = b.a_id
     JOIN c ON b.c_id = c.id
) ON a.id = something.a_id
LEFT JOIN (
     table1 t1
     JOIN table2 t2 ON t1.some_field = t2.other_field
     JOIN talbe3 t3 ON t2.yafield = t3.something_else
) ON ....
repeat a few more times


In his case the worst of these join blocks was causing a nested loop join through some 200k rows, about 20k times (do the math), and since keys couldn't be pushed to indexes, it was a sequential scan. This meant that the overall query took about 3 hours to run due to cascading plan changes. By distributing the left join, the keys could be pushed down and the query ran in a matter of seconds. Of course this isn't exactly equivalent which is why the planner can't treat them as equivalent and so it was left figuring out that plan as a hash join and then doing a nested loop in, which was painfully slow.

Any time you rigidly force the joins to go through in a certain order you introduce cases where key filter information may not be available yet in the execution of the plan, and so what might be possible to do later in a quick index scan/hash join might be have to be done much slower in a nested loop/sequential scan and so while the above fragment is not immediately equivalent, it shows the same problem.

Code Snippets

LEFT JOIN ( 
     a 
     JOIN b ON a.id = b.a_id
     JOIN c ON b.c_id = c.id
) ON a.id = something.a_id
LEFT JOIN (
     table1 t1
     JOIN table2 t2 ON t1.some_field = t2.other_field
     JOIN talbe3 t3 ON t2.yafield = t3.something_else
) ON ....
repeat a few more times

Context

StackExchange Database Administrators Q#16409, answer score: 12

Revisions (0)

No revisions yet.