patternsqlMinor
Very slow simple JOIN query
Viewed 0 times
simplequeryjoinslowvery
Problem
Simple DB structure (for an online forum):
Around 80k entries in
`Limit (cost=316926.14..316926.39 rows=100 width=20) (actual time=2326.812..2326.830 rows=100 loops=1)
-> Sort (cost=316926.14..317014.83 rows=35476 width=20) (actual time=2326.809..2326.820 rows=100 loops=1)
Sort Key: (count(p.id)) DESC
Sort Method: top-N heapsort Memory: 32kB
-> HashAggregate (cost=315215.51..315570.27 rows=35476 width=20) (actual time=2311.296..2321.739 rows=34608 loops=1)
Group Key: u.id
-> Hash Join (cost=1176.89..308201.88 rows=1402727 width=16) (actual time=16.538..1784.546 rows=1910831 loops=1)
Hash Cond: (p.user_id = u.id)
-> Seq Scan on posts p (cost=0.00..286185.34 rows=1816634 width=8) (actual time=0.103..1144.681 rows=2173916 loops=1)
-> Hash (cost=733.44..733.44 rows=35476 width=12) (actual time=15.763..15.763 rows=34609 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 2021kB
-> Seq Scan on users u (cost=0.00..733.44 rows=35476 width=12) (actual time=0.033..6.521 rows=34609 loops=1)
CREATE TABLE users (
id integer NOT NULL PRIMARY KEY,
username text
);
CREATE INDEX ON users (username);
CREATE TABLE posts (
id integer NOT NULL PRIMARY KEY,
thread_id integer NOT NULL REFERENCES threads (id),
user_id integer NOT NULL REFERENCES users (id),
date timestamp without time zone NOT NULL,
content text
);
CREATE INDEX ON posts (thread_id);
CREATE INDEX ON posts (user_id);Around 80k entries in
users and 2,6 million entries in posts tables. This simple query to get top 100 users by their posts takes 2,4 seconds:EXPLAIN ANALYZE SELECT u.id, u.username, COUNT(p.id) AS PostCount FROM users u
INNER JOIN posts p on p.user_id = u.id
WHERE u.username IS NOT NULL
GROUP BY u.id
ORDER BY PostCount DESC LIMIT 100;`Limit (cost=316926.14..316926.39 rows=100 width=20) (actual time=2326.812..2326.830 rows=100 loops=1)
-> Sort (cost=316926.14..317014.83 rows=35476 width=20) (actual time=2326.809..2326.820 rows=100 loops=1)
Sort Key: (count(p.id)) DESC
Sort Method: top-N heapsort Memory: 32kB
-> HashAggregate (cost=315215.51..315570.27 rows=35476 width=20) (actual time=2311.296..2321.739 rows=34608 loops=1)
Group Key: u.id
-> Hash Join (cost=1176.89..308201.88 rows=1402727 width=16) (actual time=16.538..1784.546 rows=1910831 loops=1)
Hash Cond: (p.user_id = u.id)
-> Seq Scan on posts p (cost=0.00..286185.34 rows=1816634 width=8) (actual time=0.103..1144.681 rows=2173916 loops=1)
-> Hash (cost=733.44..733.44 rows=35476 width=12) (actual time=15.763..15.763 rows=34609 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 2021kB
-> Seq Scan on users u (cost=0.00..733.44 rows=35476 width=12) (actual time=0.033..6.521 rows=34609 loops=1)
Solution
This may or may not work - I'm basing this off a gut feeling that it's joining your tables before the group and filter. I suggest trying the following: filter and group using a CTE before attempting the join:
The query planner sometimes just needs a little guidance. This solution works well here, but CTEs can potentially be terrible in some circumstances. CTEs are stored exclusively in memory. As a result of this, large data returns can exceed Postgres' allocated memory and start swapping (paging in MS). CTEs also cannot be indexed, so a sufficiently large query could still cause significant slow down when querying your CTE.
The best advice you can really take away is to try it multiple ways and check your query plans.
with
__posts as(
select
user_id,
count(1) as num_posts
from
posts
group by
user_id
order by
num_posts desc
limit 100
)
select
users.username,
__posts.num_posts
from
users
inner join __posts on(
__posts.user_id = users.id
)
order by
num_posts descThe query planner sometimes just needs a little guidance. This solution works well here, but CTEs can potentially be terrible in some circumstances. CTEs are stored exclusively in memory. As a result of this, large data returns can exceed Postgres' allocated memory and start swapping (paging in MS). CTEs also cannot be indexed, so a sufficiently large query could still cause significant slow down when querying your CTE.
The best advice you can really take away is to try it multiple ways and check your query plans.
Code Snippets
with
__posts as(
select
user_id,
count(1) as num_posts
from
posts
group by
user_id
order by
num_posts desc
limit 100
)
select
users.username,
__posts.num_posts
from
users
inner join __posts on(
__posts.user_id = users.id
)
order by
num_posts descContext
StackExchange Database Administrators Q#169431, answer score: 8
Revisions (0)
No revisions yet.