patternsqlMinor
Why is Postgres CTE slower than subquery?
Viewed 0 times
whypostgresthanslowerctesubquery
Problem
I have a somewhat involved query that splits strings and outputs each word as a record.
I did a quick test, one with a CTE and one with a subquery and was somewhat surprised to see that the CTE takes twice as long to execute.
Here is the gist of what the query does:
Inline subquery
CTE
And here are the EXPLAINs for each:
Subquery Explain
```
Sort (cost=15921589.76..16082302.91 rows=64285258 width=40) (actual time=16299.150..17697.914 rows=4394788 loops=1)
Sort Key: sub_query._created_at DESC
Sort Method: external merge Disk: 116112kB
Buffers: shared hit=22915 read=7627, temp read=34281 written=34281
-> Subquery Scan on sub_qu
I did a quick test, one with a CTE and one with a subquery and was somewhat surprised to see that the CTE takes twice as long to execute.
Here is the gist of what the query does:
-- 1. translate matches characters from comment to given list (of symbols) and replaces them with commas.
-- 2. string_to_array splits string by comma and puts in an array
-- 3. unnest unpacks the array into rowsInline subquery
SELECT
sub_query.word,
sub_query._created_at
FROM
( SELECT unnest(string_to_array(translate(nps_reports.comment::text, ' ,.<>?/;:@#~[{]}=+-_)("*&^%$£!`\|}'::text, ',,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,'::text), ','::text, ''::text)) AS word,
nps_reports.comment,
nps_reports._id,
nps_reports._created_at
FROM nps_reports
WHERE nps_reports.comment::text <> 'undefined'::text
) sub_query
WHERE sub_query.word IS NOT NULL AND NOT (sub_query.word IN ( SELECT stop_words.stop_word FROM stop_words))
ORDER BY sub_query._created_at DESC;CTE
WITH split AS
(
SELECT unnest(string_to_array(translate(nps_reports.comment::text, ' ,.<>?/;:@#~[{]}=+-_)("*&^%$£!`\|}'::text, ',,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,'::text), ','::text, ''::text)) AS word,
nps_reports.comment,
nps_reports._id,
nps_reports._created_at
FROM nps_reports
WHERE nps_reports.comment::text <> 'undefined'::text
)
SELECT
split.word,
split._created_at
FROM split
WHERE split.word IS NOT NULL AND NOT (split.word IN ( SELECT stop_words.stop_word FROM stop_words))
ORDER BY split._created_at DESC;And here are the EXPLAINs for each:
Subquery Explain
```
Sort (cost=15921589.76..16082302.91 rows=64285258 width=40) (actual time=16299.150..17697.914 rows=4394788 loops=1)
Sort Key: sub_query._created_at DESC
Sort Method: external merge Disk: 116112kB
Buffers: shared hit=22915 read=7627, temp read=34281 written=34281
-> Subquery Scan on sub_qu
Solution
CTE's in PostgreSQL are an optimization fence. That means the query planner doesn't push optimizations across a CTE boundary.
I think a lot of this is silly though you can just write it like this.. Here we use
All of that said, whatever you're doing seems to be reinventing FTS. So that's also a bad idea.
- Blog entry about this
I think a lot of this is silly though you can just write it like this.. Here we use
CROSS JOIN LATERAL rather than the complex wrapping and NOT EXISTS rather than NOT INSELECT word,
_created_at
FROM nps_reports
CROSS JOIN LATERAL unnest(regexp_split_to_array(
nps_reports.comment,
'[^a-zA-Z0-9]+'
)) AS word
WHERE nps_reports.comment <> 'undefined'
AND nps_reports.comment IS NOT NULL
AND NOT EXISTS (
SELECT 1
FROM stop_words
WHERE stop_words.stop_word = word
)
ORDER BY _created_at DESC;All of that said, whatever you're doing seems to be reinventing FTS. So that's also a bad idea.
Code Snippets
SELECT word,
_created_at
FROM nps_reports
CROSS JOIN LATERAL unnest(regexp_split_to_array(
nps_reports.comment,
'[^a-zA-Z0-9]+'
)) AS word
WHERE nps_reports.comment <> 'undefined'
AND nps_reports.comment IS NOT NULL
AND NOT EXISTS (
SELECT 1
FROM stop_words
WHERE stop_words.stop_word = word
)
ORDER BY _created_at DESC;Context
StackExchange Database Administrators Q#188093, answer score: 7
Revisions (0)
No revisions yet.