patternsqlCritical
Optimizing a Postgres query with a large IN
Viewed 0 times
postgreswithquerylargeoptimizing
Problem
This query gets a list of posts created by people you follow. You can follow an unlimited number of people, but most people follow < 1000 others.
With this style of query, the obvious optimization would be to cache the
Yields:
```
Limit (cost=3.01..4555.20 rows=10 width=2601) (actual time=7923.011..7973.138 rows=10 loops=1)
-> Nested Loop Left Join (cost=3.01..9019264.02 rows=19813 width=2601) (actual time=7923.010..7973.133 rows=10 loops=1)
-> Nested Loop Left Join (cost=2.58..8935617.96 rows=19813 width=2376) (actual time=7922.995..7973.063 rows=10 loops=1)
-> Nested Loop Left Join (cost=2.15..8821537.89 rows=19813 width=2315) (actual time=7922.984..7961.868 rows=10 loops=1)
-> Nested Loop Left Join (cost=1.71..8700662.11 rows=19813 width=2090) (actual time=7922.981..7961.846 rows=10 loops=1)
-> Nested Loop Left Join (cost=1.29..8610743.68 rows=19813 width=2021) (actual time=7922.977..7961.816 rows=10 loops=1)
With this style of query, the obvious optimization would be to cache the
"Post" ids, but unfortunately I do not have the time for that right now.EXPLAIN ANALYZE SELECT
"Post"."id",
"Post"."actionId",
"Post"."commentCount",
...
FROM
"Posts" AS "Post"
INNER JOIN "Users" AS "user" ON "Post"."userId" = "user"."id"
LEFT OUTER JOIN "ActivityLogs" AS "activityLog" ON "Post"."activityLogId" = "activityLog"."id"
LEFT OUTER JOIN "WeightLogs" AS "weightLog" ON "Post"."weightLogId" = "weightLog"."id"
LEFT OUTER JOIN "Workouts" AS "workout" ON "Post"."workoutId" = "workout"."id"
LEFT OUTER JOIN "WorkoutLogs" AS "workoutLog" ON "Post"."workoutLogId" = "workoutLog"."id"
LEFT OUTER JOIN "Workouts" AS "workoutLog.workout" ON "workoutLog"."workoutId" = "workoutLog.workout"."id"
WHERE
"Post"."userId" IN (
201486,
1825186,
998608,
340844,
271909,
308218,
341986,
216893,
1917226,
... -- many more
)
AND "Post"."private" IS NULL
ORDER BY
"Post"."createdAt" DESC
LIMIT 10;Yields:
```
Limit (cost=3.01..4555.20 rows=10 width=2601) (actual time=7923.011..7973.138 rows=10 loops=1)
-> Nested Loop Left Join (cost=3.01..9019264.02 rows=19813 width=2601) (actual time=7923.010..7973.133 rows=10 loops=1)
-> Nested Loop Left Join (cost=2.58..8935617.96 rows=19813 width=2376) (actual time=7922.995..7973.063 rows=10 loops=1)
-> Nested Loop Left Join (cost=2.15..8821537.89 rows=19813 width=2315) (actual time=7922.984..7961.868 rows=10 loops=1)
-> Nested Loop Left Join (cost=1.71..8700662.11 rows=19813 width=2090) (actual time=7922.981..7961.846 rows=10 loops=1)
-> Nested Loop Left Join (cost=1.29..8610743.68 rows=19813 width=2021) (actual time=7922.977..7961.816 rows=10 loops=1)
Solution
Instead of using a huge
It'd be nice if PostgreSQL could do this internally & automatically but at this point the planner doesn't know how.
Similar topics:
IN-list, join on a VALUES expression, or if the list is large enough, use a temp table, index it, then join on it.It'd be nice if PostgreSQL could do this internally & automatically but at this point the planner doesn't know how.
Similar topics:
- https://stackoverflow.com/q/24647503/398670
- https://stackoverflow.com/q/17813492/398670
Context
StackExchange Database Administrators Q#91247, answer score: 62
Revisions (0)
No revisions yet.