patternsqlMinor
PostgreSQL: ... LIMIT 1 much slower than fetching all rows?
Viewed 0 times
postgresqlrowsmuchalllimitthanslowerfetching
Problem
If I fetch only one row, the query takes much longer: 1433 ms vs 23 ms
Is there a work around?
Slow:
Fast:
```
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM "modwork_ticket" WHERE
"modwork_ticket"."email_sender_id" = 'foo@example.com'
ORDER BY "modwork_ticket"."date_created" ASC;
--------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=4335.83..4338.63 rows=1120 width=1146) (actual time=23.637..23.794 rows=584 loops=1)
Sort Key: date_created
Sort Method: quicksort Memory: 732kB
Buffers: shared hit=544 read=4
-> Bitmap Heap Scan on modwork_ticket (cost=25.11..4279.11 rows=1120 width=1146) (actual time=20.479..22.595 rows=584 loops=1)
Recheck Cond: ((email_sender_id)::text = 'foo@example.com'::text)
Heap Blocks: exact=538
Buffers: shared hit=541 read=4
-> Bitmap Index Scan on modwork_ticket_email_sender_id_like (cost=0.00..24.83 rows=1120 width=0) (actual time=20.388..20.388 rows=
Index Cond: ((email_sender_id)::text = 'foo@example.com'::text)
Buffers: shared hit=3 read=4
Planning tim
Is there a work around?
Slow:
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM "modwork_ticket" WHERE
"modwork_ticket"."email_sender_id" = 'foo@example.com'
ORDER BY "modwork_ticket"."date_created" ASC limit 1;
--------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..542.14 rows=1 width=1146) (actual time=1433.128..1433.129 rows=1 loops=1)
Buffers: shared hit=1466151 read=4661
-> Index Scan using modwork_ticket_date_created on modwork_ticket (cost=0.43..606714.36 rows=1120 width=1146) (actual time=1433.125..1433
Filter: ((email_sender_id)::text = 'foo@example.com'::text)
Rows Removed by Filter: 1705251
Buffers: shared hit=1466151 read=4661
Planning time: 2.504 ms
Execution time: 1433.218 msFast:
```
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM "modwork_ticket" WHERE
"modwork_ticket"."email_sender_id" = 'foo@example.com'
ORDER BY "modwork_ticket"."date_created" ASC;
--------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=4335.83..4338.63 rows=1120 width=1146) (actual time=23.637..23.794 rows=584 loops=1)
Sort Key: date_created
Sort Method: quicksort Memory: 732kB
Buffers: shared hit=544 read=4
-> Bitmap Heap Scan on modwork_ticket (cost=25.11..4279.11 rows=1120 width=1146) (actual time=20.479..22.595 rows=584 loops=1)
Recheck Cond: ((email_sender_id)::text = 'foo@example.com'::text)
Heap Blocks: exact=538
Buffers: shared hit=541 read=4
-> Bitmap Index Scan on modwork_ticket_email_sender_id_like (cost=0.00..24.83 rows=1120 width=0) (actual time=20.388..20.388 rows=
Index Cond: ((email_sender_id)::text = 'foo@example.com'::text)
Buffers: shared hit=3 read=4
Planning tim
Solution
The problem is that rows for 'foo@example.com' are deficient in values with an early date_created, but the planner doesn't know that. It thinks they are evenly scattered, so it will find one early on when walking
You can create an index on
If you don't want to create the index and just want it to the use the 2nd plan, you can force it by writing the query like this:
By adding something to the date before you sort, it tricks
Once v10 of
modwork_ticket_date_created and will be able to stop then. But instead it has to walk through 1,705,251 of them.You can create an index on
(email_sender_id text_pattern_ops, date_created) and it will be able to jump directly to the tuple you want. That should be much faster than either existing plan. (I proposed the including text_pattern_ops purely because the name of the existing index has the word "like" in it, so I assume it was already defined that way--but maybe you actually have a column named "like". It would be better if you show us the definitions so we don't have to guess about things such as this). This index may be able to replace an existing one, so the overall number of indexes might stay the same.If you don't want to create the index and just want it to the use the 2nd plan, you can force it by writing the query like this:
SELECT * FROM modwork_ticket WHERE
modwork_ticket.email_sender_id = 'foo@example.com'
ORDER BY modwork_ticket.date_created + '0 days' ASC limit 1;By adding something to the date before you sort, it tricks
PostgreSQL into thinking it can't use the date index.Once v10 of
PostgreSQL is released and you upgrade to it, you can create an advanced statistics definition for the functional dependency between the two columns, which might also be enough to get it to choose the better plan.Code Snippets
SELECT * FROM modwork_ticket WHERE
modwork_ticket.email_sender_id = 'foo@example.com'
ORDER BY modwork_ticket.date_created + '0 days' ASC limit 1;Context
StackExchange Database Administrators Q#185819, answer score: 4
Revisions (0)
No revisions yet.