principlesqlMinor
EXISTS() vs EXISTS() = TRUE in Postgres
Viewed 0 times
truepostgresexists
Problem
Faced weird behaviour with
vs.
```
EXPLAIN ANALYZE
SELECT * FROM books
WHERE EXISTS (SELECT id FROM authors WHERE id = books.author_id AND name LIKE 'asd%') = True;
| QUERY PLAN |
| ------------------------------------------------------------------
EXISTS (also applies for NOT EXISTS) generating different execution plans forWHERE EXISTS(...)EXPLAIN ANALYZE
SELECT * FROM books
WHERE EXISTS (SELECT 1 FROM authors WHERE id = books.author_id AND name LIKE 'asd%');
| QUERY PLAN |
| -------------------------------------------------------------------------------------------------------------- |
| Hash Join (cost=218.01..454.43 rows=56 width=40) (actual time=0.975..0.975 rows=0 loops=1) |
| Hash Cond: (books.author_id = authors.id) |
| -> Seq Scan on books (cost=0.00..206.80 rows=11280 width=40) (actual time=0.010..0.010 rows=1 loops=1) |
| -> Hash (cost=217.35..217.35 rows=53 width=4) (actual time=0.943..0.943 rows=0 loops=1) |
| Buckets: 1024 Batches: 1 Memory Usage: 8kB |
| -> Seq Scan on authors (cost=0.00..217.35 rows=53 width=4) (actual time=0.942..0.943 rows=0 loops=1) |
| Filter: ((name)::text ~~ 'asd%'::text) |
| Rows Removed by Filter: 10000 |
| Planning Time: 0.361 ms |
| Execution Time: 1.022 ms |vs.
WHERE EXISTS(...) = TRUE```
EXPLAIN ANALYZE
SELECT * FROM books
WHERE EXISTS (SELECT id FROM authors WHERE id = books.author_id AND name LIKE 'asd%') = True;
| QUERY PLAN |
| ------------------------------------------------------------------
Solution
PostgreSQL is able to optimize
Since the optimization is not used, it is unsurprising that the second plan is slower. Although, to be honest, with a tiny query like that the difference could just be noise.
Some background for the second execution plan:
The second plan with the
WHERE EXISTS (/ correlated subquery /) into a join or semi-join, but it is not smart enough to detect that the = TRUE in EXISTS (...) = TRUE can be removed, so it does not apply the optimization here.Since the optimization is not used, it is unsurprising that the second plan is slower. Although, to be honest, with a tiny query like that the difference could just be noise.
Some background for the second execution plan:
The second plan with the
alternatives: shows that you are using an older version of PostgreSQL, which still had AlternativeSubPlans. The idea behind that was that PostgreSQL could potentially decide to start using a different subplan during query execution if the row count estimates proved to be off. This capability was removed with commit 41efb83408 in v14. You may want to refer to Tom Lane's commit message for details:Move resolution of AlternativeSubPlan choices to the planner.
When commit bd3daddaf introduced AlternativeSubPlans, I had some
ambitions towards allowing the choice of subplan to change during
execution. That has not happened, or even been thought about, in the
ensuing twelve years; so it seems like a failed experiment. So let's
rip that out and resolve the choice of subplan at the end of planning
(in setrefs.c) rather than during executor startup. This has a number
of positive benefits:
* Removal of a few hundred lines of executor code, since
AlternativeSubPlans need no longer be supported there.
* Removal of executor-startup overhead (particularly, initialization
of subplans that won't be used).
* Removal of incidental costs of having a larger plan tree, such as
tree-scanning and copying costs in the plancache; not to mention
setrefs.c's own costs of processing the discarded subplans.
* EXPLAIN no longer has to print a weird (and undocumented)
representation of an AlternativeSubPlan choice; it sees only the
subplan actually used. This should mean less confusion for users.
* Since setrefs.c knows which subexpression of a plan node it's
working on at any instant, it's possible to adjust the estimated
number of executions of the subplan based on that. For example,
we should usually estimate more executions of a qual expression
than a targetlist expression. The implementation used here is
pretty simplistic, because we don't want to expend a lot of cycles
on the issue; but it's better than ignoring the point entirely,
as the executor had to.
That last point might possibly result in shifting the choice
between hashed and non-hashed EXISTS subplans in a few cases,
but in general this patch isn't meant to change planner choices.
Since we're doing the resolution so late, it's really impossible
to change any plan choices outside the AlternativeSubPlan itself.
Patch by me; thanks to David Rowley for review.
Discussion: https://postgr.es/m/1992952.1592785225@sss.pgh.pa.us
Context
StackExchange Database Administrators Q#312569, answer score: 9
Revisions (0)
No revisions yet.