snippetsqlMinor
How to prevent PostgreSQL from rewriting OUTER JOIN queries?
Viewed 0 times
postgresqlpreventhowjoinrewritingqueriesfromouter
Problem
My query is:
Postgres changed the full outerjoin between
However, I am hacking Postgres to implement my join enumeration related algorithm and to do experiments. I do not want Postgres to change the full outerjoin to a left outerjoin. Is there a way to do so?
SELECT Acol1, Acol2, Bcol1, Bcol2, Ccol1, Ccol2
FROM tableA LEFT JOIN
(tableB FULL JOIN tableC ON (Bcol1 = Ccol1))
ON (Acol1 = Bcol1)EXPLAIN ANALYZE gives me:QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Hash Right Join (cost=99.65..180.45 rows=1770 width=24) (actual time=0.043..0.103 rows=3 loops=1)
Hash Cond: (tableb.bcol1 = tablea.acol1)
-> Hash Left Join (cost=49.83..104.08 rows=1770 width=16) (actual time=0.011..0.062 rows=3 loops=1)
Hash Cond: (tableb.bcol1 = tablec.ccol1)
-> Seq Scan on tableb (cost=0.00..27.70 rows=1770 width=8) (actual time=0.001..0.002 rows=3 loops=1)
-> Hash (cost=27.70..27.70 rows=1770 width=8) (actual time=0.004..0.004 rows=3 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Seq Scan on tablec (cost=0.00..27.70 rows=1770 width=8) (actual time=0.001..0.002 rows=3 loops=1)
-> Hash (cost=27.70..27.70 rows=1770 width=8) (actual time=0.014..0.014 rows=3 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Seq Scan on tablea (cost=0.00..27.70 rows=1770 width=8) (actual time=0.009..0.011 rows=3 loops=1)
Total runtime: 0.151 msPostgres changed the full outerjoin between
tableB and tableC to a right outerjoin, since later the left outerjoin with tableA will eliminate null values anyway. It is equivalent to the original query.However, I am hacking Postgres to implement my join enumeration related algorithm and to do experiments. I do not want Postgres to change the full outerjoin to a left outerjoin. Is there a way to do so?
Solution
You can introduce optimization barriers for your purpose.
Preface
Plain
We do see "Full Join" for the full join by itself:
We could rewrite the full join with a subquery:
You have to sort out conflicting names with aliases in the subquery - but then again, you would need to do this in any case in the outer
The query is still optimized as a whole since subqueries do not impose optimization barriers. You'll still see a "Left Join" or "Right Join". However, we can expand on this form to arrive at a solution:
Solution 1.
You'll see a "Full Join".
Why? As soon as a subquery uses an
Solution 2. Use a CTE (documented)
You'll see a "Full Join" as well.
The manual for Postgres 11 (before
A useful property of
per execution of the parent query, even if they are referred to more
than once by the parent query or sibling
calculations that are needed in multiple places can be placed within a
WITH query to avoid redundant work. Another possible application is to
prevent unwanted multiple evaluations of functions with side-effects.
However, the other side of this coin is that the optimizer is less
able to push restrictions from the parent query down into a WITH query
than an ordinary subquery. The
as written, without suppression of rows that the parent query might
discard afterwards. (But, as mentioned above, evaluation might stop
early if the reference(s) to the query demand only a limited number of rows.)
Since Postgres 12, the manual adds:
However, if a
is, it is a
folded into the parent query, allowing joint optimization of the two
query levels. By default, this happens if the parent query references
the
more than once. You can override that decision by specifying
the parent query. The latter choice risks duplicate computation of the
output.
Bold emphasis mine.
db<>fiddle here
Old sqlfiddle
Preface
Plain
EXPLAIN (without ANALYZE) is enough for the purpose of this question. Parentheses around the ON expression are just noise. Adding table aliases to be unambiguous.We do see "Full Join" for the full join by itself:
SELECT * FROM tableB b FULL JOIN tableC c ON b.Bcol1 = c.Ccol1;We could rewrite the full join with a subquery:
SELECT a.Acol1, a.Acol2, d.Bcol1, d.Bcol2, d.Ccol1, d.Ccol2
FROM tableA a
LEFT JOIN (
SELECT * -- sort out conflicting names with aliases
FROM tableB b FULL JOIN tableC c ON b.Bcol1 = c.Ccol1
) d ON a.Acol1 = d.Bcol1;You have to sort out conflicting names with aliases in the subquery - but then again, you would need to do this in any case in the outer
SELECT for multiple columns of the same name in the underlying tables.The query is still optimized as a whole since subqueries do not impose optimization barriers. You'll still see a "Left Join" or "Right Join". However, we can expand on this form to arrive at a solution:
Solution 1.
OFFSET 0 hack (undocumented)EXPLAIN
SELECT a.Acol1, a.Acol2, d.Bcol1, d.Bcol2, d.Ccol1, d.Ccol2
FROM tableA a
LEFT JOIN (
SELECT * -- you'll have to sort out conflicting names with aliases
FROM tableB b FULL JOIN tableC c ON b.Bcol1 = c.Ccol1
OFFSET 0 -- undocumented hack
) d ON a.Acol1 = d.Bcol1;You'll see a "Full Join".
Why? As soon as a subquery uses an
OFFSET clause, the query planner / optimiser plans the subquery separately. OFFSET 0 is logical noise, but Postgres still considers the clause which makes this a query hint to materialize the subquery, effectively. (Even though Postgres does not otherwise support query hints.) It's a much debated issue. Related:- How can I prevent Postgres from inlining a subquery?
- http://blog.2ndquadrant.com/postgresql-ctes-are-optimization-fences/
Solution 2. Use a CTE (documented)
EXPLAIN
WITH cte AS MATERIALIZED ( -- requires "MATERIALIZED" in Postgres 12 or later!
SELECT * -- you'll have to sort out conflicting names with aliases
FROM tableB b FULL JOIN tableC c ON b.Bcol1 = c.Ccol1
)
SELECT a.Acol1, a.Acol2, d.Bcol1, d.Bcol2, d.Ccol1, d.Ccol2
FROM tableA a
LEFT JOIN cte d ON a.Acol1 = d.Bcol1;You'll see a "Full Join" as well.
The manual for Postgres 11 (before
AS MATERIALIZED was added):A useful property of
WITH queries is that they are evaluated only onceper execution of the parent query, even if they are referred to more
than once by the parent query or sibling
WITH queries. Thus, expensivecalculations that are needed in multiple places can be placed within a
WITH query to avoid redundant work. Another possible application is to
prevent unwanted multiple evaluations of functions with side-effects.
However, the other side of this coin is that the optimizer is less
able to push restrictions from the parent query down into a WITH query
than an ordinary subquery. The
WITH query will generally be evaluatedas written, without suppression of rows that the parent query might
discard afterwards. (But, as mentioned above, evaluation might stop
early if the reference(s) to the query demand only a limited number of rows.)
Since Postgres 12, the manual adds:
However, if a
WITH query is non-recursive and side-effect-free (thatis, it is a
SELECT containing no volatile functions) then it can befolded into the parent query, allowing joint optimization of the two
query levels. By default, this happens if the parent query references
the
WITH query just once, but not if it references the WITH querymore than once. You can override that decision by specifying
MATERIALIZED to force separate calculation of the WITH query, or by specifying NOT MATERIALIZED to force it to be merged intothe parent query. The latter choice risks duplicate computation of the
WITH query, but it can still give a net savings if each usage of theWITH query needs only a small part of the WITH query's fulloutput.
Bold emphasis mine.
db<>fiddle here
Old sqlfiddle
Code Snippets
SELECT * FROM tableB b FULL JOIN tableC c ON b.Bcol1 = c.Ccol1;SELECT a.Acol1, a.Acol2, d.Bcol1, d.Bcol2, d.Ccol1, d.Ccol2
FROM tableA a
LEFT JOIN (
SELECT * -- sort out conflicting names with aliases
FROM tableB b FULL JOIN tableC c ON b.Bcol1 = c.Ccol1
) d ON a.Acol1 = d.Bcol1;EXPLAIN
SELECT a.Acol1, a.Acol2, d.Bcol1, d.Bcol2, d.Ccol1, d.Ccol2
FROM tableA a
LEFT JOIN (
SELECT * -- you'll have to sort out conflicting names with aliases
FROM tableB b FULL JOIN tableC c ON b.Bcol1 = c.Ccol1
OFFSET 0 -- undocumented hack
) d ON a.Acol1 = d.Bcol1;EXPLAIN
WITH cte AS MATERIALIZED ( -- requires "MATERIALIZED" in Postgres 12 or later!
SELECT * -- you'll have to sort out conflicting names with aliases
FROM tableB b FULL JOIN tableC c ON b.Bcol1 = c.Ccol1
)
SELECT a.Acol1, a.Acol2, d.Bcol1, d.Bcol2, d.Ccol1, d.Ccol2
FROM tableA a
LEFT JOIN cte d ON a.Acol1 = d.Bcol1;Context
StackExchange Database Administrators Q#131384, answer score: 5
Revisions (0)
No revisions yet.