HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Slow Query with LIMIT

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
withlimitqueryslow

Problem

We have this query:

SELECT "messages".*
FROM "messages"
WHERE "messages"."account_id" = 1
AND "messages"."status" = 'unread'
AND (created_at >= '2014-04-24 01:00:00.000000')
ORDER BY id DESC


It runs fine under 0.1ms. But when we add LIMIT:

SELECT "messages".*
FROM "messages"
WHERE "messages"."account_id" = 1
AND "messages"."status" = 'unread'
AND (created_at >= '2014-04-24 01:00:00.000000')
ORDER BY id DESC
LIMIT 5


It runs over 20,000ms. Some notes:

  • The messages table has over 300,000 rows, not that big.



  • We have indexes on all 3 columns in the WHERE condition.



  • The account_id and created_at conditions filter out 3,000 messages from the 300,000 rows. Out of the 3,000 messages in that account_id, there are only 3 unread messages.



So we isolated each WHERE condition and found out about a certain behaviour -- we tweaked the LIMIT to correspond with the amount of unread messages:

SELECT "messages".*
FROM "messages"
WHERE "messages"."account_id" = 1
AND "messages"."status" = 'unread'
AND (created_at >= '2014-04-24 01:00:00.000000')
ORDER BY id DESC
LIMIT 3


It runs under 0.100ms. Any explanation for the difference in performance?

Solution

Adding LIMIT to the query often makes Postgres choose a different query plan. If statistics or cost estimates stray too far from actual data distribution / actual costs, you may end up with a slower query, even though Postgres reckoned it would be faster. Irregular data distribution may cause the query planner to misjudge the selectivity of WHERE conditions.

In particular, increasing the STATISTICS target for status, account_id, created_at and id might help.

Details:

  • How can I avoid PostgreSQL sometimes choosing a bad query plan for one of two nearly identical queries?



Aside from that, a (partial) multi column index may help. Depends on the complete situation. Which of your WHERE conditions are constant? How can they vary? Is some additional maintenance cost and disk space acceptable for a special index?

For example:

CREATE INDEX messages_special_idx ON messages (account_id, created_at, id)
WHERE status = 'unread'
AND   created_at >= '2014-01-01 01:00:00';


Run ANALYZE messages; before you try your queries again.

Code Snippets

CREATE INDEX messages_special_idx ON messages (account_id, created_at, id)
WHERE status = 'unread'
AND   created_at >= '2014-01-01 01:00:00';

Context

StackExchange Database Administrators Q#80988, answer score: 2

Revisions (0)

No revisions yet.