patternsqlMinor
Really slow DISTINCT ON query with multiple joins
Viewed 0 times
distinctreallywithqueryslowmultiplejoins
Problem
Originally posted:
https://stackoverflow.com/questions/11173717/expensive-query-on-select-distinct-with-multiple-inner-join-in-postgres
The songs table has only about 4k rows, posts and stations have less. Running the query without the
Running Postgres on Mac OS X Lion.
```
Song Load (7358.2ms)
EXPLAIN (426.2ms)
EXPLAIN for:
SELECT DISTINCT ON (songs.rank, songs.shared_id) songs.*,
songs.*,
posts.url as post_url,
posts.excerpt as post_excerpt,
stations.title as station_title,
stations.slug as station_slug
FROM "songs"
INNER JOIN "posts" ON "posts"."id" = "songs"."post_id"
inner join stations on stations.blog_id = songs.blog_id
WHERE "songs"."processed" = 't'
AND "songs"."working" = 't'
ORDER BY songs.rank desc
LIMIT 24 OFFSET 0
QUERY PLAN
------------------------------------------------------------------------------------------------
Limit (cost=546147.28..546159.16 rows=24 width=2525)
-> Unique (cost=546147.28..547360.75 rows=2452 width=2525)
-> Sort (cost=546147.28..546551.77 rows=161796 width=2525)
Sort Key: songs.rank, songs.shared_id
-> Hash Join (cost=466.50..2906.84 rows=161796 width=2525)
Hash Cond: (songs.blog_id = stations.blog_id)
-> Hash Join (cost=249.41..587.52 rows=2452 width=2499)
Hash Cond: (songs.post_id = posts.id)
-> Seq Scan on songs (cost=0.00..304.39 rows=2452 width=2223)
Filter: (processed AND working)
-> Hash (cost=230.85..230.85 rows=1485 width=280)
-> Seq Scan on posts (cost=0.00..230.85 rows=1485 width=280)
-> Hash (cost=140.93..140.93 rows=6093 width=30)
-> Seq Scan on stations (cost=0.00..140.93 rows=6093 wi
https://stackoverflow.com/questions/11173717/expensive-query-on-select-distinct-with-multiple-inner-join-in-postgres
The songs table has only about 4k rows, posts and stations have less. Running the query without the
DISTINCT ON fixes it.Running Postgres on Mac OS X Lion.
```
Song Load (7358.2ms)
EXPLAIN (426.2ms)
EXPLAIN for:
SELECT DISTINCT ON (songs.rank, songs.shared_id) songs.*,
songs.*,
posts.url as post_url,
posts.excerpt as post_excerpt,
stations.title as station_title,
stations.slug as station_slug
FROM "songs"
INNER JOIN "posts" ON "posts"."id" = "songs"."post_id"
inner join stations on stations.blog_id = songs.blog_id
WHERE "songs"."processed" = 't'
AND "songs"."working" = 't'
ORDER BY songs.rank desc
LIMIT 24 OFFSET 0
QUERY PLAN
------------------------------------------------------------------------------------------------
Limit (cost=546147.28..546159.16 rows=24 width=2525)
-> Unique (cost=546147.28..547360.75 rows=2452 width=2525)
-> Sort (cost=546147.28..546551.77 rows=161796 width=2525)
Sort Key: songs.rank, songs.shared_id
-> Hash Join (cost=466.50..2906.84 rows=161796 width=2525)
Hash Cond: (songs.blog_id = stations.blog_id)
-> Hash Join (cost=249.41..587.52 rows=2452 width=2499)
Hash Cond: (songs.post_id = posts.id)
-> Seq Scan on songs (cost=0.00..304.39 rows=2452 width=2223)
Filter: (processed AND working)
-> Hash (cost=230.85..230.85 rows=1485 width=280)
-> Seq Scan on posts (cost=0.00..230.85 rows=1485 width=280)
-> Hash (cost=140.93..140.93 rows=6093 width=30)
-> Seq Scan on stations (cost=0.00..140.93 rows=6093 wi
Solution
Because the
and then you have:
which is similar to
I would first try adding a compound index on (first the columns in
The ordering:
Not really sure if this would contribute to efficiency but you can test.
Addition by @Erwin
As per request in comment
In principal (default) b-tree indexes can be scanned forward and backward at the same speed. But sorting can make a difference in multi-column indexes where you combine the sort order of multiple columns. The query starts with:
In combination with
I have my doubts if the
Looks like one of the rare cases where explicit ordering of index columns would benefit the query. There is an excellent explanation in the chapter Indexes and ORDER BY of the manual.
If the WHERE condition is stable (always
WHERE condition of the query involves only equality checks:WHERE "songs"."processed" = 't'
AND "songs"."working" = 't'and then you have:
SELECT DISTINCT ON (songs.rank, songs.shared_id) ...which is similar to
GROUP BY songs.rank, songs.shared_idI would first try adding a compound index on (first the columns in
WHERE, then the columns in DISTINCT ON):(processed, working, rank, shared_id)The ordering:
ORDER BY rank DESC may be better optimized if you have the index as:(processed, working, rank DESC, shared_id)Not really sure if this would contribute to efficiency but you can test.
Addition by @Erwin
As per request in comment
In principal (default) b-tree indexes can be scanned forward and backward at the same speed. But sorting can make a difference in multi-column indexes where you combine the sort order of multiple columns. The query starts with:
SELECT DISTINCT ON (songs.rank, songs.shared_id)In combination with
ORDER BY rank DESC this dictates that the result be ordered by rank DESC, shared_id effectively. After the (simplified) WHERE clause WHERE processed AND working has been applied and before LIMIT can be applied.I have my doubts if the
DISTINCT clause is actually useful. But while it is there, the optimal index for the query should be (just as @ypercube suspected):CREATE INDEX songs_special_idx
ON songs (processed, working, rank DESC, shared_id);Looks like one of the rare cases where explicit ordering of index columns would benefit the query. There is an excellent explanation in the chapter Indexes and ORDER BY of the manual.
If the WHERE condition is stable (always
WHERE processed AND working), a partial multi-column index would be smaller and faster, yet:CREATE INDEX songs_special_idx
ON songs (rank DESC, shared_id)
WHERE processed AND working;Code Snippets
WHERE "songs"."processed" = 't'
AND "songs"."working" = 't'SELECT DISTINCT ON (songs.rank, songs.shared_id) ...(processed, working, rank, shared_id)(processed, working, rank DESC, shared_id)SELECT DISTINCT ON (songs.rank, songs.shared_id)Context
StackExchange Database Administrators Q#19804, answer score: 6
Revisions (0)
No revisions yet.