patternsqlModerate
Why is LIMIT killing performance of this Postgres query?
Viewed 0 times
thiswhypostgreslimitqueryperformancekilling
Problem
I have a Postgres materialized view:
I'm trying to execute a paged query that filters on the
Without LIMIT:
With LIMIT:
```
EXPLAIN ANALYZE SELECT document_id FROM my_view WHERE (parcels @> '[3022890014]') ORDER BY recorded_date DESC LIMIT 25;
QUERY PLAN
-----------
Column | Type | Modifiers
---------------------+-------------------+-----------
document_id | character varying |
recorded_date | date |
parcels | jsonb |
Indexes:
"index_my_view_on_document_id" btree (document_id)
"index_my_view_on_recorded_date" btree (recorded_date)
"index_my_view_on_parcels" gin (parcels)I'm trying to execute a paged query that filters on the
parcels jsonb array field, but my performance tanks whenever I add LIMIT:Without LIMIT:
EXPLAIN ANALYZE SELECT document_id FROM my_view WHERE (parcels @> '[3022890014]') ORDER BY recorded_date DESC;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=24178.50..24194.79 rows=6518 width=21) (actual time=11.272..11.275 rows=22 loops=1)
Sort Key: recorded_date DESC
Sort Method: quicksort Memory: 26kB
-> Bitmap Heap Scan on my_view (cost=78.51..23765.58 rows=6518 width=21) (actual time=3.199..10.281 rows=22 loops=1)
Recheck Cond: (parcels @> '[3022890014]'::jsonb)
Heap Blocks: exact=12
-> Bitmap Index Scan on index_my_view_on_parcels (cost=0.00..76.88 rows=6518 width=0) (actual time=3.166..3.166 rows=22 loops=1)
Index Cond: (parcels @> '[3022890014]'::jsonb)
Planning time: 2.201 ms
Execution time: 11.395 ms
(10 rows)With LIMIT:
```
EXPLAIN ANALYZE SELECT document_id FROM my_view WHERE (parcels @> '[3022890014]') ORDER BY recorded_date DESC LIMIT 25;
QUERY PLAN
-----------
Solution
PostgreSQL thinks it will find 6518 rows meeting your condition. So when you tell it to stop at 25, it thinks it would rather scan the rows already in order and stop after it finds the 25th one in order, which is after 25/6518, or 0.4%, of the table. But in reality there are only 22 rows which meet the requirements, so it ends up having to scan the whole table instead, being 250 times more work than it thought it would be. The other plan, using the gin index, ends up being over 250 times less work than PostgreSQL thinks it will be, for the same reason--it thinks it will to find and then sort 6518 thing when it will really be 22 things.
If you used a more appropriate data structure, like a regular PostgreSQL array rather than a degenerate JSONB object, then it would have a more accurate idea of how many rows will meet the condition and would probably make a better choice.
If you used a more appropriate data structure, like a regular PostgreSQL array rather than a degenerate JSONB object, then it would have a more accurate idea of how many rows will meet the condition and would probably make a better choice.
Context
StackExchange Database Administrators Q#208065, answer score: 11
Revisions (0)
No revisions yet.