HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Postgres WITH not performed first

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
performedpostgreswithfirstnot

Problem

I am working in PostgreSQL 9.5 on a SQL query that pulls from two views. Individually, the views take 93 msec and 530 msec to perform, resulting in ~1500 and ~6200 rows, respectively. This seems reasonable given the processing going into the view, which includes some spatial intersections with PostGIS. However, when I perform the query below, it takes ~5 seconds to perform.

From what I understand, using WITH should force the subqueries to be executed once each prior to evaluation in the query below, which should result in a total time under one second rather than five seconds.

If I link to the two views through ODBC/MS Access and use MS Access to perform the query without WITH, it executes in less than a second. This makes me think that the subqueries are not being performed once each prior to joining when I execute them in pgAdmin.

No other CTEs in the views, just subqueries.

Is there a better way to do this, or a hack to force the order of execution?

WITH 
  eu AS (
    SELECT
      e.sample_pk,
      .....
    FROM
      public.samples_exp_unit AS e
    ),
  sr AS (
    SELECT
      r.sample_pk,
      .....
    FROM
      public.sample_results AS r
    )

SELECT
  sr.sample_pk,
  eu.inventory_pk,
  .....
FROM sr INNER JOIN eu
ON sr.sample_pk = eu.sample_pk


Execution plan of full query:

`"Hash Join (cost=963.49..1007.02 rows=9 width=120) (actual time=40.508..54.209 rows=45392 loops=1)"
" Hash Cond: (sr.sample_pk = eu.sample_pk)"
" CTE eu"
" -> Nested Loop (cost=41.81..50.67 rows=1 width=24) (actual time=1.296..8.307 rows=1538 loops=1)"
" Join Filter: (lil.locations_pk = l.locations_pk)"
" -> Nested Loop (cost=41.67..48.22 rows=1 width=28) (actual time=1.289..6.582 rows=1538 loops=1)"
" -> Nested Loop (cost=41.53..47.92 rows=1 width=32) (actual time=1.283..4.852 rows=1538 loops=1)"
" -> Hash Join (cost=41.38..47.66 rows=1 width=28) (actual time=1.275..3.079 rows=1480 loops=1)"

Solution

Since public.samples_exp_unit and public.sample_results are the views, you have added a level of indirection in your queries.

You can rewrite to be simpler:

WITH 
  eu AS (TABLE public.samples_exp_unit),
  sr AS (TABLE public.sample_results)
SELECT
  sr.sample_pk,
  eu.inventory_pk,
  .....
FROM sr INNER JOIN eu
  ON sr.sample_pk = eu.sample_pk ;


and test if this changes the plan and improves efficiency.

Another thing to test is to avoid CTEs. Especially because they are an optimization fence, they sometimes result in less efficient plans (and sometimes in more efficient plans!) Rewrite and test without a CTE:

SELECT
  sr.sample_pk,
  eu.inventory_pk,
  .....
FROM public.sample_results AS sr 
  INNER JOIN 
     public.samples_exp_unit AS eu
  ON sr.sample_pk = eu.sample_pk ;


Also try the queries through psql. It will show if PgAdmin is the issue.

It could be that pgAdmin is taking a long time to render the results of the query, or is talking over a slow network and so takes a long time to fetch the results. It is also possible pgAdmin is doing something that changes the execution plan, like silently setting some parameters to non-default values, or preparing the query into a cursor before execution.

Code Snippets

WITH 
  eu AS (TABLE public.samples_exp_unit),
  sr AS (TABLE public.sample_results)
SELECT
  sr.sample_pk,
  eu.inventory_pk,
  .....
FROM sr INNER JOIN eu
  ON sr.sample_pk = eu.sample_pk ;
SELECT
  sr.sample_pk,
  eu.inventory_pk,
  .....
FROM public.sample_results AS sr 
  INNER JOIN 
     public.samples_exp_unit AS eu
  ON sr.sample_pk = eu.sample_pk ;

Context

StackExchange Database Administrators Q#174472, answer score: 7

Revisions (0)

No revisions yet.