HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Why is Postgres CTE slower than subquery?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
whypostgresthanslowerctesubquery

Problem

I have a somewhat involved query that splits strings and outputs each word as a record.

I did a quick test, one with a CTE and one with a subquery and was somewhat surprised to see that the CTE takes twice as long to execute.

Here is the gist of what the query does:

-- 1. translate matches characters from comment to given list (of symbols) and replaces them with commas.
-- 2. string_to_array splits string by comma and puts in an array
-- 3. unnest unpacks the array into rows


Inline subquery

SELECT
    sub_query.word,
    sub_query._created_at
FROM 
(   SELECT unnest(string_to_array(translate(nps_reports.comment::text, ' ,.<>?/;:@#~[{]}=+-_)("*&^%$£!`\|}'::text, ',,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,'::text), ','::text, ''::text)) AS word,
        nps_reports.comment,
        nps_reports._id,
        nps_reports._created_at
    FROM nps_reports
    WHERE nps_reports.comment::text <> 'undefined'::text
) sub_query 
WHERE sub_query.word IS NOT NULL AND NOT (sub_query.word IN ( SELECT stop_words.stop_word FROM stop_words))
ORDER BY sub_query._created_at DESC;


CTE

WITH split AS
(
SELECT unnest(string_to_array(translate(nps_reports.comment::text, ' ,.<>?/;:@#~[{]}=+-_)("*&^%$£!`\|}'::text, ',,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,'::text), ','::text, ''::text)) AS word,
    nps_reports.comment,
    nps_reports._id,
    nps_reports._created_at
FROM nps_reports
WHERE nps_reports.comment::text <> 'undefined'::text
)

SELECT
    split.word,
    split._created_at
FROM split
WHERE split.word IS NOT NULL AND NOT (split.word IN ( SELECT stop_words.stop_word FROM stop_words))
ORDER BY split._created_at DESC;


And here are the EXPLAINs for each:

Subquery Explain

```
Sort (cost=15921589.76..16082302.91 rows=64285258 width=40) (actual time=16299.150..17697.914 rows=4394788 loops=1)
Sort Key: sub_query._created_at DESC
Sort Method: external merge Disk: 116112kB
Buffers: shared hit=22915 read=7627, temp read=34281 written=34281
-> Subquery Scan on sub_qu

Solution

CTE's in PostgreSQL are an optimization fence. That means the query planner doesn't push optimizations across a CTE boundary.

  • Blog entry about this



I think a lot of this is silly though you can just write it like this.. Here we use CROSS JOIN LATERAL rather than the complex wrapping and NOT EXISTS rather than NOT IN

SELECT word,
  _created_at
FROM nps_reports
CROSS JOIN LATERAL unnest(regexp_split_to_array(
  nps_reports.comment,
  '[^a-zA-Z0-9]+'
)) AS word
WHERE nps_reports.comment <> 'undefined'
  AND nps_reports.comment IS NOT NULL
  AND NOT EXISTS (
    SELECT 1
    FROM stop_words
    WHERE stop_words.stop_word = word
  )
ORDER BY _created_at DESC;


All of that said, whatever you're doing seems to be reinventing FTS. So that's also a bad idea.

Code Snippets

SELECT word,
  _created_at
FROM nps_reports
CROSS JOIN LATERAL unnest(regexp_split_to_array(
  nps_reports.comment,
  '[^a-zA-Z0-9]+'
)) AS word
WHERE nps_reports.comment <> 'undefined'
  AND nps_reports.comment IS NOT NULL
  AND NOT EXISTS (
    SELECT 1
    FROM stop_words
    WHERE stop_words.stop_word = word
  )
ORDER BY _created_at DESC;

Context

StackExchange Database Administrators Q#188093, answer score: 7

Revisions (0)

No revisions yet.