HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

How to prevent PostgreSQL from rewriting OUTER JOIN queries?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlpreventhowjoinrewritingqueriesfromouter

Problem

My query is:

SELECT  Acol1, Acol2, Bcol1, Bcol2, Ccol1, Ccol2
FROM    tableA LEFT JOIN
            (tableB FULL JOIN tableC ON (Bcol1 = Ccol1))
            ON (Acol1 = Bcol1)


EXPLAIN ANALYZE gives me:

QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------
 Hash Right Join  (cost=99.65..180.45 rows=1770 width=24) (actual time=0.043..0.103 rows=3 loops=1)
   Hash Cond: (tableb.bcol1 = tablea.acol1)
   ->  Hash Left Join  (cost=49.83..104.08 rows=1770 width=16) (actual time=0.011..0.062 rows=3 loops=1)
         Hash Cond: (tableb.bcol1 = tablec.ccol1)
         ->  Seq Scan on tableb  (cost=0.00..27.70 rows=1770 width=8) (actual time=0.001..0.002 rows=3 loops=1)
         ->  Hash  (cost=27.70..27.70 rows=1770 width=8) (actual time=0.004..0.004 rows=3 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 1kB
               ->  Seq Scan on tablec  (cost=0.00..27.70 rows=1770 width=8) (actual time=0.001..0.002 rows=3 loops=1)
   ->  Hash  (cost=27.70..27.70 rows=1770 width=8) (actual time=0.014..0.014 rows=3 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 1kB
         ->  Seq Scan on tablea  (cost=0.00..27.70 rows=1770 width=8) (actual time=0.009..0.011 rows=3 loops=1)
 Total runtime: 0.151 ms


Postgres changed the full outerjoin between tableB and tableC to a right outerjoin, since later the left outerjoin with tableA will eliminate null values anyway. It is equivalent to the original query.

However, I am hacking Postgres to implement my join enumeration related algorithm and to do experiments. I do not want Postgres to change the full outerjoin to a left outerjoin. Is there a way to do so?

Solution

You can introduce optimization barriers for your purpose.
Preface

Plain EXPLAIN (without ANALYZE) is enough for the purpose of this question. Parentheses around the ON expression are just noise. Adding table aliases to be unambiguous.

We do see "Full Join" for the full join by itself:

SELECT * FROM tableB b FULL JOIN tableC c ON b.Bcol1 = c.Ccol1;


We could rewrite the full join with a subquery:

SELECT a.Acol1, a.Acol2, d.Bcol1, d.Bcol2, d.Ccol1, d.Ccol2
FROM   tableA a
LEFT   JOIN (
   SELECT *  -- sort out conflicting names with aliases
   FROM   tableB b FULL JOIN tableC c ON b.Bcol1 = c.Ccol1
   ) d ON a.Acol1 = d.Bcol1;


You have to sort out conflicting names with aliases in the subquery - but then again, you would need to do this in any case in the outer SELECT for multiple columns of the same name in the underlying tables.

The query is still optimized as a whole since subqueries do not impose optimization barriers. You'll still see a "Left Join" or "Right Join". However, we can expand on this form to arrive at a solution:
Solution 1. OFFSET 0 hack (undocumented)

EXPLAIN
SELECT a.Acol1, a.Acol2, d.Bcol1, d.Bcol2, d.Ccol1, d.Ccol2
FROM   tableA a
LEFT   JOIN (
   SELECT *  -- you'll have to sort out conflicting names with aliases
   FROM   tableB b FULL JOIN tableC c ON b.Bcol1 = c.Ccol1
   OFFSET 0  -- undocumented hack
   ) d ON a.Acol1 = d.Bcol1;


You'll see a "Full Join".

Why? As soon as a subquery uses an OFFSET clause, the query planner / optimiser plans the subquery separately. OFFSET 0 is logical noise, but Postgres still considers the clause which makes this a query hint to materialize the subquery, effectively. (Even though Postgres does not otherwise support query hints.) It's a much debated issue. Related:

  • How can I prevent Postgres from inlining a subquery?



  • http://blog.2ndquadrant.com/postgresql-ctes-are-optimization-fences/



Solution 2. Use a CTE (documented)

EXPLAIN
WITH cte AS MATERIALIZED ( -- requires "MATERIALIZED" in Postgres 12 or later!
   SELECT *  -- you'll have to sort out conflicting names with aliases
   FROM   tableB b FULL JOIN tableC c ON b.Bcol1 = c.Ccol1
   ) 
SELECT a.Acol1, a.Acol2, d.Bcol1, d.Bcol2, d.Ccol1, d.Ccol2
FROM   tableA a
LEFT   JOIN cte d ON a.Acol1 = d.Bcol1;


You'll see a "Full Join" as well.

The manual for Postgres 11 (before AS MATERIALIZED was added):

A useful property of WITH queries is that they are evaluated only once
per execution of the parent query, even if they are referred to more
than once by the parent query or sibling WITH queries. Thus, expensive
calculations that are needed in multiple places can be placed within a
WITH query to avoid redundant work. Another possible application is to
prevent unwanted multiple evaluations of functions with side-effects.
However, the other side of this coin is that the optimizer is less
able to push restrictions from the parent query down into a WITH query
than an ordinary subquery. The WITH query will generally be evaluated
as written, without suppression of rows that the parent query might
discard afterwards. (But, as mentioned above, evaluation might stop
early if the reference(s) to the query demand only a limited number of rows.)

Since Postgres 12, the manual adds:

However, if a WITH query is non-recursive and side-effect-free (that
is, it is a SELECT containing no volatile functions) then it can be
folded into the parent query, allowing joint optimization of the two
query levels. By default, this happens if the parent query references
the WITH query just once, but not if it references the WITH query
more than once. You can override that decision by specifying
MATERIALIZED to force separate calculation of the WITH query, or by specifying NOT MATERIALIZED to force it to be merged into
the parent query. The latter choice risks duplicate computation of the
WITH query, but it can still give a net savings if each usage of the
WITH query needs only a small part of the WITH query's full
output.

Bold emphasis mine.

db<>fiddle here

Old sqlfiddle

Code Snippets

SELECT * FROM tableB b FULL JOIN tableC c ON b.Bcol1 = c.Ccol1;
SELECT a.Acol1, a.Acol2, d.Bcol1, d.Bcol2, d.Ccol1, d.Ccol2
FROM   tableA a
LEFT   JOIN (
   SELECT *  -- sort out conflicting names with aliases
   FROM   tableB b FULL JOIN tableC c ON b.Bcol1 = c.Ccol1
   ) d ON a.Acol1 = d.Bcol1;
EXPLAIN
SELECT a.Acol1, a.Acol2, d.Bcol1, d.Bcol2, d.Ccol1, d.Ccol2
FROM   tableA a
LEFT   JOIN (
   SELECT *  -- you'll have to sort out conflicting names with aliases
   FROM   tableB b FULL JOIN tableC c ON b.Bcol1 = c.Ccol1
   OFFSET 0  -- undocumented hack
   ) d ON a.Acol1 = d.Bcol1;
EXPLAIN
WITH cte AS MATERIALIZED ( -- requires "MATERIALIZED" in Postgres 12 or later!
   SELECT *  -- you'll have to sort out conflicting names with aliases
   FROM   tableB b FULL JOIN tableC c ON b.Bcol1 = c.Ccol1
   ) 
SELECT a.Acol1, a.Acol2, d.Bcol1, d.Bcol2, d.Ccol1, d.Ccol2
FROM   tableA a
LEFT   JOIN cte d ON a.Acol1 = d.Bcol1;

Context

StackExchange Database Administrators Q#131384, answer score: 5

Revisions (0)

No revisions yet.