patternsqlModerate

Have Postgresql query planner use nested loop w/ indices over hash join

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

postgresqlindicesplannerqueryloophashjoinnesteduseover

Problem

I'm having a problem with some StackOverflow-schema related data loaded into PostgreSQL 9.3.4. I have a query that is taking about 10x longer than it should, due to the fact that it is choosing to use a hash join instead of a nested loop w/ indices. For example, if I select 500 users in the query, a hash join is used instead of using the id and type indices on the post_tokenized table:

```
explain
select creation_epoch, user_screen_name, chunk from post_tokenized as tokenized_tbl
join posts as posts_tbl
on posts_tbl.id = tokenized_tbl.id
where type = 'tag'
and user_screen_name is not null
and owner_user_id in (select id from users where reputation > 100000 order by reputation asc limit 500)
and tokenized_tbl.id in (select id from posts where owner_user_id in (select id from users where reputation > 100000 order by reputation asc limit 500))

Hash Join (cost=29570.13..751852.55 rows=119954 width=21)
Hash Cond: (tokenized_tbl.id = posts_tbl.id)
-> Index Scan using type_index_post_tokenized on post_tokenized tokenized_tbl (cost=0.44..646219.29 rows=20281711 width=8)
Index Cond: (type = 'tag'::text)
-> Hash (cost=29561.73..29561.73 rows=637 width=25)
-> Hash Join (cost=15576.75..29561.73 rows=637 width=25)
Hash Cond: (posts_tbl.id = posts.id)
-> Nested Loop (cost=48.20..12824.71 rows=106853 width=21)
-> HashAggregate (cost=47.76..52.76 rows=500 width=4)
-> Limit (cost=0.43..41.51 rows=500 width=8)
-> Index Scan using reputation_index_users on users (cost=0.43..211.57 rows=2570 width=8)
Index Cond: (reputation > 100000)
-> Index Scan using owner_user_id_index_posts on posts posts_tbl (cost=0.44..23.40 rows=214 width=25)

Solution

This closely related answer on SO has an answer to your primary question:

Setting enable_seqscan = off in a single SELECT query

You could use that in similar fashion, to disable hash joins for the current transaction:

SET LOCAL enable_hashjoin=off;

But that's not my advice. Read the answer over there.

And this one about statistics and cost settings:

Keep PostgreSQL from sometimes choosing a bad query plan

More importantly, untangle your query first:

SELECT creation_epoch, user_screen_name, chunk
FROM  (
   SELECT id AS owner_user_id
   FROM   users
   WHERE  reputation > 100000
   ORDER  BY reputation 
   LIMIT  500
   ) u
JOIN   posts p USING (owner_user_id)
JOIN   post_tokenized t USING (id)
WHERE  type = 'tag'
AND    user_screen_name IS NOT NULL;

Should be considerably faster and also make it easier for the query planner to choose the best plan (given sane cost settings and table statistics).

Code Snippets

SET LOCAL enable_hashjoin=off;

SELECT creation_epoch, user_screen_name, chunk
FROM  (
   SELECT id AS owner_user_id
   FROM   users
   WHERE  reputation > 100000
   ORDER  BY reputation 
   LIMIT  500
   ) u
JOIN   posts p USING (owner_user_id)
JOIN   post_tokenized t USING (id)
WHERE  type = 'tag'
AND    user_screen_name IS NOT NULL;

Context

StackExchange Database Administrators Q#66678, answer score: 14

Revisions (0)

No revisions yet.