HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

PostgreSQL: ... LIMIT 1 much slower than fetching all rows?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlrowsmuchalllimitthanslowerfetching

Problem

If I fetch only one row, the query takes much longer: 1433 ms vs 23 ms

Is there a work around?

Slow:

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM "modwork_ticket" WHERE 
 "modwork_ticket"."email_sender_id" = 'foo@example.com' 
 ORDER BY "modwork_ticket"."date_created" ASC limit 1;

--------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..542.14 rows=1 width=1146) (actual time=1433.128..1433.129 rows=1 loops=1)
   Buffers: shared hit=1466151 read=4661
   ->  Index Scan using modwork_ticket_date_created on modwork_ticket  (cost=0.43..606714.36 rows=1120 width=1146) (actual time=1433.125..1433
         Filter: ((email_sender_id)::text = 'foo@example.com'::text)
         Rows Removed by Filter: 1705251
         Buffers: shared hit=1466151 read=4661
 Planning time: 2.504 ms
 Execution time: 1433.218 ms


Fast:

```
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM "modwork_ticket" WHERE
"modwork_ticket"."email_sender_id" = 'foo@example.com'
ORDER BY "modwork_ticket"."date_created" ASC;

--------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=4335.83..4338.63 rows=1120 width=1146) (actual time=23.637..23.794 rows=584 loops=1)
Sort Key: date_created
Sort Method: quicksort Memory: 732kB
Buffers: shared hit=544 read=4
-> Bitmap Heap Scan on modwork_ticket (cost=25.11..4279.11 rows=1120 width=1146) (actual time=20.479..22.595 rows=584 loops=1)
Recheck Cond: ((email_sender_id)::text = 'foo@example.com'::text)
Heap Blocks: exact=538
Buffers: shared hit=541 read=4
-> Bitmap Index Scan on modwork_ticket_email_sender_id_like (cost=0.00..24.83 rows=1120 width=0) (actual time=20.388..20.388 rows=
Index Cond: ((email_sender_id)::text = 'foo@example.com'::text)
Buffers: shared hit=3 read=4
Planning tim

Solution

The problem is that rows for 'foo@example.com' are deficient in values with an early date_created, but the planner doesn't know that. It thinks they are evenly scattered, so it will find one early on when walking modwork_ticket_date_created and will be able to stop then. But instead it has to walk through 1,705,251 of them.

You can create an index on (email_sender_id text_pattern_ops, date_created) and it will be able to jump directly to the tuple you want. That should be much faster than either existing plan. (I proposed the including text_pattern_ops purely because the name of the existing index has the word "like" in it, so I assume it was already defined that way--but maybe you actually have a column named "like". It would be better if you show us the definitions so we don't have to guess about things such as this). This index may be able to replace an existing one, so the overall number of indexes might stay the same.

If you don't want to create the index and just want it to the use the 2nd plan, you can force it by writing the query like this:

SELECT * FROM modwork_ticket WHERE 
 modwork_ticket.email_sender_id = 'foo@example.com' 
 ORDER BY modwork_ticket.date_created + '0 days' ASC limit 1;


By adding something to the date before you sort, it tricks PostgreSQL into thinking it can't use the date index.

Once v10 of PostgreSQL is released and you upgrade to it, you can create an advanced statistics definition for the functional dependency between the two columns, which might also be enough to get it to choose the better plan.

Code Snippets

SELECT * FROM modwork_ticket WHERE 
 modwork_ticket.email_sender_id = 'foo@example.com' 
 ORDER BY modwork_ticket.date_created + '0 days' ASC limit 1;

Context

StackExchange Database Administrators Q#185819, answer score: 4

Revisions (0)

No revisions yet.