HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Why is LIMIT killing performance of this Postgres query?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
thiswhypostgreslimitqueryperformancekilling

Problem

I have a Postgres materialized view:

Column        |       Type        | Modifiers 
---------------------+-------------------+-----------
document_id         | character varying | 
recorded_date       | date              | 
parcels             | jsonb             | 
Indexes:
"index_my_view_on_document_id" btree (document_id)
"index_my_view_on_recorded_date" btree (recorded_date)
"index_my_view_on_parcels" gin (parcels)


I'm trying to execute a paged query that filters on the parcels jsonb array field, but my performance tanks whenever I add LIMIT:

Without LIMIT:

EXPLAIN ANALYZE SELECT document_id FROM my_view WHERE (parcels @> '[3022890014]') ORDER BY recorded_date DESC;
                                                                       QUERY PLAN                                                                       
--------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=24178.50..24194.79 rows=6518 width=21) (actual time=11.272..11.275 rows=22 loops=1)
   Sort Key: recorded_date DESC
   Sort Method: quicksort  Memory: 26kB
   ->  Bitmap Heap Scan on my_view  (cost=78.51..23765.58 rows=6518 width=21) (actual time=3.199..10.281 rows=22 loops=1)
         Recheck Cond: (parcels @> '[3022890014]'::jsonb)
         Heap Blocks: exact=12
         ->  Bitmap Index Scan on index_my_view_on_parcels  (cost=0.00..76.88 rows=6518 width=0) (actual time=3.166..3.166 rows=22 loops=1)
               Index Cond: (parcels @> '[3022890014]'::jsonb)
 Planning time: 2.201 ms
 Execution time: 11.395 ms
(10 rows)


With LIMIT:

```
EXPLAIN ANALYZE SELECT document_id FROM my_view WHERE (parcels @> '[3022890014]') ORDER BY recorded_date DESC LIMIT 25;
QUERY PLAN
-----------

Solution

PostgreSQL thinks it will find 6518 rows meeting your condition. So when you tell it to stop at 25, it thinks it would rather scan the rows already in order and stop after it finds the 25th one in order, which is after 25/6518, or 0.4%, of the table. But in reality there are only 22 rows which meet the requirements, so it ends up having to scan the whole table instead, being 250 times more work than it thought it would be. The other plan, using the gin index, ends up being over 250 times less work than PostgreSQL thinks it will be, for the same reason--it thinks it will to find and then sort 6518 thing when it will really be 22 things.

If you used a more appropriate data structure, like a regular PostgreSQL array rather than a degenerate JSONB object, then it would have a more accurate idea of how many rows will meet the condition and would probably make a better choice.

Context

StackExchange Database Administrators Q#208065, answer score: 11

Revisions (0)

No revisions yet.