HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Why is the CTE much worse than inline subqueries

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
whythemuchthancteworseinlinesubqueries

Problem

Im trying to better understand how the query planner work in postgresql.

I have this query:

select id from users 
    where id <> 2
    and gender = (select gender from users where id = 2)
    order by latest_location::geometry  (select latest_location from users where id = 2) ASC
    limit 50


It runs in less than 10ms on my database with around 500k entries in the users table.

Then I thought that to avoid the duplicate subselects I could rewrite the query as a CTE, like this:

with me as (
    select * from users where id = 2
)
select u.id, u.popularity from users u, me 
    where u.gender = me.gender
    order by  u.latest_location::geometry  me.latest_location::geometry ASC
    limit 50;


However, this rewritten query runs in around 1 sec! Why does this happen? I see in the explains that it doesnt use the geometry index, but can anything be done for that? Thanks!

Another way to write the query is:

select u.id, u.popularity from users u, (select gender, latest_location from users where id = 2) as me 
    where u.gender = me.gender
    order by  u.latest_location::geometry  me.latest_location::geometry ASC
    limit 50;


However, this will also be as slow as the CTE.

If on the other hand I extract out the me parameters and statically insert them the query is quick again:

select u.id, u.popularity from users u
    where u.gender = 'male'
    order by  u.latest_location::geometry  '0101000000A49DE61DA71C5A403D0AD7A370F54340'::geometry ASC
    limit 50;


Explain of the first (fast) query

```
Limit (cost=5.69..20.11 rows=50 width=36) (actual time=0.512..8.114 rows=50 loops=1)
InitPlan 1 (returns $0)
-> Index Scan using users_pkey on users users_1 (cost=0.42..2.64 rows=1 width=32) (actual time=0.032..0.033 rows=1 loops=1)
Index Cond: (id = 2)
InitPlan 2 (returns $1)
-> Index Scan using users_pkey on users users_2 (cost=0.42..2.64 rows=1 width=4) (actual time=0.009..0.010 rows=1 loops=1)

Solution

Try this:

with me as (
    select * from users where id = 2
)
select u.id, u.popularity from users u, me 
    where u.gender = (select gender from me)
    order by  u.latest_location::geometry  (select latest_location from me)::geometry ASC
    limit 50;


When I look at the fast plan here's what jumps out at me (bolded):

Limit (cost=5.69..20.11 rows=50 width=36) (actual time=0.512..8.114 rows=50 loops=1)
InitPlan 1 (returns $0)
-> Index Scan using users_pkey on users users_1 (cost=0.42..2.64 rows=1 width=32) (actual time=0.032..0.033 rows=1 loops=1)
Index Cond: (id = 2)
InitPlan 2 (returns $1)
-> Index Scan using users_pkey on users users_2 (cost=0.42..2.64 rows=1 width=4) (actual time=0.009..0.010 rows=1 loops=1)
Index Cond: (id = 2)
-> Index Scan using users_latest_location_gix on users (cost=0.41..70796.51 rows=245470 width=36) (actual time=0.509..8.100 rows=50 loops=1)
Order By: (latest_location $0)
Filter: (gender = $1)
Rows Removed by Filter: 20
Total runtime: 8.211 ms
(12 rows)

In the slow version, the query planner is evaluating the equality operator on gender and the geometry operator on latest_location in the context of a join, where the value from me could vary with each row (even though it has correctly estimated only 1 row). In the fast version the values of gender and latest_location are treated as scalars because they are emitted by inline subqueries, which tells the query planner it only has one value of each to deal with. This is the same reason why you get the fast plan when you paste the literal values.

Code Snippets

with me as (
    select * from users where id = 2
)
select u.id, u.popularity from users u, me 
    where u.gender = (select gender from me)
    order by  u.latest_location::geometry <-> (select latest_location from me)::geometry ASC
    limit 50;

Context

StackExchange Database Administrators Q#84760, answer score: 13

Revisions (0)

No revisions yet.