snippetsqlModerate
How to use index to speed up sorting in Postgres?
Viewed 0 times
sortingpostgreshowindexusespeed
Problem
I have a table
I want to return all messages ordered by
The query I used is:
Here's an SQL fiddle with schema, exact query, and expected result.
The performance of the query is slow once the
messages in a Postgres 9.4 database. Messages belong to feed_id, and have posted_at. Also, messages can have a parent message (in case of replies). Table "public.messages"
Column | Type | Modifiers
------------------------------+-----------------------------+-----------
message_id | character varying(255) | not null
feed_id | integer |
parent_id | character varying(255) |
posted_at | timestamp without time zone |
share_count | integer |
Indexes:
"messages_pkey" PRIMARY KEY, btree (message_id)
"index_messages_on_feed_id_posted_at" btree (feed_id, posted_at DESC NULLS LAST)
I want to return all messages ordered by
share_count, but for each parent_id, I only want to return one message. ie, if multiple messages have the same parent_id, then only the latest one (posted_at) is returned. The parent_id can be null, messages with null parent_id should all return.The query I used is:
WITH filtered_messages AS (SELECT *
FROM messages
WHERE feed_id IN (7)
AND (posted_at >= '2015-01-01 04:00:00.000000')
AND (posted_at < '2015-04-28 04:00:00.000000'))
SELECT *
FROM (SELECT DISTINCT ON(COALESCE(parent_id, message_id)) parent_id,
message_id,
posted_at,
share_count
FROM filtered_messages
ORDER BY COALESCE(parent_id, message_id), posted_at DESC NULLS LAST
) messages
ORDER BY share_count DESC NULLS LAST, posted_at DESC NULLS LAST;Here's an SQL fiddle with schema, exact query, and expected result.
The performance of the query is slow once the
messages table gets big. I tried add multiple sorting indexes, but it does not seem to Solution
Query
This query should be substantially faster in any case:
The CTE does nothing here that a plain subquery could not deliver also. And a CTE introduces an optimization barrier since it is executed separately and its result is materialized.
Update: that changes in Postgres 12. See:
You had one more subquery-level than you actually need.
The expression
Splitting the simple case of
Indices
Especially when supported with these indices:
The two partial indices cover the whole table together and are about the same size together as a single total index.
Covering indexes (with the
The
db<>fiddle here
Old sqlfiddle
Depending on missing details,
Possibly faster alternatives here:
This query should be substantially faster in any case:
SELECT parent_id, message_id, posted_at, share_count
FROM messages
WHERE feed_id = 7
AND posted_at >= '2015-01-01 4:0:0'
AND posted_at = '2015-01-01 4:0:0'
AND posted_at < '2015-04-28 4:0:0'
AND parent_id IS NOT NULL -- match index condition
ORDER BY parent_id, posted_at DESC NULLS LAST
)
ORDER BY share_count DESC NULLS LAST, posted_at DESC NULLS LAST;The CTE does nothing here that a plain subquery could not deliver also. And a CTE introduces an optimization barrier since it is executed separately and its result is materialized.
Update: that changes in Postgres 12. See:
- How to prevent PostgreSQL from rewriting OUTER JOIN queries?
You had one more subquery-level than you actually need.
The expression
(COALESCE(parent_id, message_id) is not compatible with a plain index. You would need an index on that expression. But that may not be very useful either, depending on data distribution. See links below.Splitting the simple case of
parent_id IS NULL into a separate SELECT may or may not deliver the optimum. Especially not, if that's a rare case anyway, in which case a combined query with an index on (COALESCE(parent_id, message_id) may perform better. Other considerations apply ...Indices
Especially when supported with these indices:
CREATE INDEX messages_idx_null ON messages (feed_id, posted_at DESC NULLS LAST, share_count DESC NULLS LAST)
INCLUDE (parent_id, message_id)
WHERE parent_id IS NULL;
CREATE INDEX messages_idx_notnull ON messages (feed_id, posted_at DESC NULLS LAST, share_count DESC NULLS LAST)
INCLUDE (parent_id, message_id)
WHERE parent_id IS NOT NULL;The two partial indices cover the whole table together and are about the same size together as a single total index.
Covering indexes (with the
INCLUDE clause) were added with Postgres 11. See:- Do covering indexes in PostgreSQL help JOIN columns?
The
INCLUDE part only make sense if you get index-only scans out of it. Else remove it from both indices.db<>fiddle here
Old sqlfiddle
Depending on missing details,
DISTINCT ON may or may not be the best query technique for the purpose. Details here:- Select first row in each GROUP BY group?
Possibly faster alternatives here:
- Optimize GROUP BY query to retrieve latest record per user
Code Snippets
SELECT parent_id, message_id, posted_at, share_count
FROM messages
WHERE feed_id = 7
AND posted_at >= '2015-01-01 4:0:0'
AND posted_at < '2015-04-28 4:0:0'
AND parent_id IS NULL -- match index condition
UNION ALL
(
SELECT DISTINCT ON(parent_id)
parent_id, message_id, posted_at, share_count
FROM messages
WHERE feed_id = 7
AND posted_at >= '2015-01-01 4:0:0'
AND posted_at < '2015-04-28 4:0:0'
AND parent_id IS NOT NULL -- match index condition
ORDER BY parent_id, posted_at DESC NULLS LAST
)
ORDER BY share_count DESC NULLS LAST, posted_at DESC NULLS LAST;CREATE INDEX messages_idx_null ON messages (feed_id, posted_at DESC NULLS LAST, share_count DESC NULLS LAST)
INCLUDE (parent_id, message_id)
WHERE parent_id IS NULL;
CREATE INDEX messages_idx_notnull ON messages (feed_id, posted_at DESC NULLS LAST, share_count DESC NULLS LAST)
INCLUDE (parent_id, message_id)
WHERE parent_id IS NOT NULL;Context
StackExchange Database Administrators Q#98971, answer score: 14
Revisions (0)
No revisions yet.