patternsqlMajor

Why are the plans different if the queries are logically alike?

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

whythearedifferentalikequeriesplanslogically

Problem

I wrote two functions to answer Day 3's first homework question from Seven Databases in Seven Weeks.

Create a stored procedure where you can input a movie title or actor's name you like, and it will return the top five suggestions based on either movies the actor has starred in or films with similar genres.

My first attempt is correct but slow. It can take up to 2000ms to return a result.

CREATE OR REPLACE FUNCTION suggest_movies(IN query text, IN result_limit integer DEFAULT 5)
  RETURNS TABLE(movie_id integer, title text) AS
$BODY$
WITH suggestions AS (

  SELECT
    actors.name AS entity_term,
    movies.movie_id AS suggestion_id,
    movies.title AS suggestion_title,
    1 AS rank
  FROM actors
  INNER JOIN movies_actors ON (actors.actor_id = movies_actors.actor_id)
  INNER JOIN movies ON (movies.movie_id = movies_actors.movie_id)

  UNION ALL

  SELECT
    searches.title AS entity_term,
    suggestions.movie_id AS suggestion_id,
    suggestions.title AS suggestion_title,
    RANK() OVER (PARTITION BY searches.movie_id ORDER BY cube_distance(searches.genre, suggestions.genre)) AS rank
  FROM movies AS searches
  INNER JOIN movies AS suggestions ON
    (searches.movie_id <> suggestions.movie_id) AND
    (cube_enlarge(searches.genre, 2, 18) @> suggestions.genre)
)
SELECT suggestion_id, suggestion_title
FROM suggestions
WHERE entity_term = query
ORDER BY rank, suggestion_id
LIMIT result_limit;
$BODY$
LANGUAGE sql;

My second attempt is correct and fast. I optimized it by pushing the filter down from the CTE into each part of the union.

I removed this line from the outer query:

WHERE entity_term = query

I added this line to the first inner query:

WHERE actors.name = query

I added this line to the second inner query:

WHERE movies.title = query

The second function takes about 10ms to return the same result.

Nothing differs in the database apart from the function definitions.

Why does PostgreSQL produce such different plan

Solution

No automatic predicate pushdown for CTEs

PostgreSQL 9.3 doesn't do predicate pushdown for CTEs.

An optimizer that does predicate pushdown can move where clauses into inner queries. The goal is to filter out irrelevant data as early as possible. As long as the new query is logically equivalent, the engine still fetches all the relevant data, so produces the correct result, only more quickly.

Core developer Tom Lane alludes to the difficulty of determining logical equivalence on the pgsql-performance mailing list.

CTEs are also treated as optimization fences; this is not so much an
optimizer limitation as to keep the semantics sane when the CTE contains
a writable query.

The optimizer doesn't distinguish read-only CTEs from writable ones, so is overly conservative when considering plans. The 'fence' treatment stops the optimizer from moving the where clause inside the CTE, although we can see it is safe to do so.

We can wait for the PostgreSQL team to improve CTE optimization, but for now to get good performance you have to change your writing style.
Rewrite for performance

The question already shows one way to get a better plan. Duplicating the filter condition essentially hard-codes the effect of predicate pushdown.

In both plans, the engine copies result rows to a worktable so it can sort them. The larger the worktable, the slower the query.

The first plan copies all the rows in the base tables to the worktable and scans that to find the result. To make things even slower, the engine must scan the whole worktable because it has no indexes.

That's a ridiculous amount of unnecessary work. It reads all the data in the base tables twice to find the answer, when there are just an estimated 5 matching rows out of an estimated 19350 rows in the base tables.

The second plan uses the indexes to find the matching rows and copies just those to the worktable. The index effectively filtered the data for us.

On page 85 of The Art of SQL, Stéphane Faroult reminds us of user expectations.

To a very large extent, end users adjust thier patience to the number of rows they expect: when they ask for one needle, they pay little attention to the size of the haystack.

The second plan scales with the needle, so is more likely to keep your users happy.
Rewrite for maintainability

The new query is harder to maintain because you can introduce a defect by changing one filter expression but not the other.

Wouldn't it be great if we could write everything just once and still get good performance?

We can. The optimizer does predicate pushdown for subqeries.

A simpler example is easier to explain.

CREATE TABLE a (c INT);

CREATE TABLE b (c INT);

CREATE INDEX a_c ON a(c);

CREATE INDEX b_c ON b(c);

INSERT INTO a SELECT 1 FROM generate_series(1, 1000000);

INSERT INTO b SELECT 2 FROM a;

INSERT INTO a SELECT 3;

This creates two tables each with an indexed column. Together they contain a million 1s, a million 2s, and one 3.

You can find the needle 3 using either of these queries.

-- CTE
EXPLAIN ANALYZE
WITH cte AS (
  SELECT c FROM a
  UNION ALL
  SELECT c FROM b
)
SELECT c FROM cte WHERE c = 3;

-- Subquery
EXPLAIN ANALYZE
SELECT c
FROM (
  SELECT c FROM a
  UNION ALL
  SELECT c FROM b
) AS subquery
WHERE c = 3;

The plan for the CTE is slow. The engine scans three tables and reads about four million rows. It takes nearly 1000 milliseconds.

CTE Scan on cte  (cost=33275.00..78275.00 rows=10000 width=4) (actual time=471.412..943.225 rows=1 loops=1)
  Filter: (c = 3)
  Rows Removed by Filter: 2000000
  CTE cte
    ->  Append  (cost=0.00..33275.00 rows=2000000 width=4) (actual time=0.011..409.573 rows=2000001 loops=1)
          ->  Seq Scan on a  (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.010..114.869 rows=1000001 loops=1)
          ->  Seq Scan on b  (cost=0.00..18850.00 rows=1000000 width=4) (actual time=5.530..104.674 rows=1000000 loops=1)
Total runtime: 948.594 ms

The plan for the subquery is fast. The engine just seeks each index. It takes less than a millisecond.

Append  (cost=0.42..8.88 rows=2 width=4) (actual time=0.021..0.038 rows=1 loops=1)
  ->  Index Only Scan using a_c on a  (cost=0.42..4.44 rows=1 width=4) (actual time=0.020..0.021 rows=1 loops=1)
        Index Cond: (c = 3)
        Heap Fetches: 1
  ->  Index Only Scan using b_c on b  (cost=0.42..4.44 rows=1 width=4) (actual time=0.016..0.016 rows=0 loops=1)
        Index Cond: (c = 3)
        Heap Fetches: 0
Total runtime: 0.065 ms

See SQLFiddle for an interactive version.

Code Snippets

CREATE TABLE a (c INT);

CREATE TABLE b (c INT);

CREATE INDEX a_c ON a(c);

CREATE INDEX b_c ON b(c);

INSERT INTO a SELECT 1 FROM generate_series(1, 1000000);

INSERT INTO b SELECT 2 FROM a;

INSERT INTO a SELECT 3;

-- CTE
EXPLAIN ANALYZE
WITH cte AS (
  SELECT c FROM a
  UNION ALL
  SELECT c FROM b
)
SELECT c FROM cte WHERE c = 3;

-- Subquery
EXPLAIN ANALYZE
SELECT c
FROM (
  SELECT c FROM a
  UNION ALL
  SELECT c FROM b
) AS subquery
WHERE c = 3;

CTE Scan on cte  (cost=33275.00..78275.00 rows=10000 width=4) (actual time=471.412..943.225 rows=1 loops=1)
  Filter: (c = 3)
  Rows Removed by Filter: 2000000
  CTE cte
    ->  Append  (cost=0.00..33275.00 rows=2000000 width=4) (actual time=0.011..409.573 rows=2000001 loops=1)
          ->  Seq Scan on a  (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.010..114.869 rows=1000001 loops=1)
          ->  Seq Scan on b  (cost=0.00..18850.00 rows=1000000 width=4) (actual time=5.530..104.674 rows=1000000 loops=1)
Total runtime: 948.594 ms

Append  (cost=0.42..8.88 rows=2 width=4) (actual time=0.021..0.038 rows=1 loops=1)
  ->  Index Only Scan using a_c on a  (cost=0.42..4.44 rows=1 width=4) (actual time=0.020..0.021 rows=1 loops=1)
        Index Cond: (c = 3)
        Heap Fetches: 1
  ->  Index Only Scan using b_c on b  (cost=0.42..4.44 rows=1 width=4) (actual time=0.016..0.016 rows=0 loops=1)
        Index Cond: (c = 3)
        Heap Fetches: 0
Total runtime: 0.065 ms

Context

StackExchange Database Administrators Q#54388, answer score: 27

Revisions (0)

No revisions yet.