HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlCritical

Optimizing a Postgres query with a large IN

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgreswithquerylargeoptimizing

Problem

This query gets a list of posts created by people you follow. You can follow an unlimited number of people, but most people follow < 1000 others.

With this style of query, the obvious optimization would be to cache the "Post" ids, but unfortunately I do not have the time for that right now.

EXPLAIN ANALYZE SELECT
    "Post"."id",
    "Post"."actionId",
    "Post"."commentCount",
    ...
FROM
    "Posts" AS "Post"
INNER JOIN "Users" AS "user" ON "Post"."userId" = "user"."id"
LEFT OUTER JOIN "ActivityLogs" AS "activityLog" ON "Post"."activityLogId" = "activityLog"."id"
LEFT OUTER JOIN "WeightLogs" AS "weightLog" ON "Post"."weightLogId" = "weightLog"."id"
LEFT OUTER JOIN "Workouts" AS "workout" ON "Post"."workoutId" = "workout"."id"
LEFT OUTER JOIN "WorkoutLogs" AS "workoutLog" ON "Post"."workoutLogId" = "workoutLog"."id"
LEFT OUTER JOIN "Workouts" AS "workoutLog.workout" ON "workoutLog"."workoutId" = "workoutLog.workout"."id"
WHERE
"Post"."userId" IN (
    201486,
    1825186,
    998608,
    340844,
    271909,
    308218,
    341986,
    216893,
    1917226,
    ...  -- many more
)
AND "Post"."private" IS NULL
ORDER BY
    "Post"."createdAt" DESC
LIMIT 10;


Yields:

```
Limit (cost=3.01..4555.20 rows=10 width=2601) (actual time=7923.011..7973.138 rows=10 loops=1)
-> Nested Loop Left Join (cost=3.01..9019264.02 rows=19813 width=2601) (actual time=7923.010..7973.133 rows=10 loops=1)
-> Nested Loop Left Join (cost=2.58..8935617.96 rows=19813 width=2376) (actual time=7922.995..7973.063 rows=10 loops=1)
-> Nested Loop Left Join (cost=2.15..8821537.89 rows=19813 width=2315) (actual time=7922.984..7961.868 rows=10 loops=1)
-> Nested Loop Left Join (cost=1.71..8700662.11 rows=19813 width=2090) (actual time=7922.981..7961.846 rows=10 loops=1)
-> Nested Loop Left Join (cost=1.29..8610743.68 rows=19813 width=2021) (actual time=7922.977..7961.816 rows=10 loops=1)

Solution

Instead of using a huge IN-list, join on a VALUES expression, or if the list is large enough, use a temp table, index it, then join on it.

It'd be nice if PostgreSQL could do this internally & automatically but at this point the planner doesn't know how.

Similar topics:

  • https://stackoverflow.com/q/24647503/398670



  • https://stackoverflow.com/q/17813492/398670

Context

StackExchange Database Administrators Q#91247, answer score: 62

Revisions (0)

No revisions yet.