HiveBrain v1.2.0
Get Started
← Back to all entries
gotchasqlMinor

Why does this limit make the postgres planner use a much slower index scan instead of a much faster bitmap heap/index scan?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
thiswhytheplannermuchpostgresscanlimitmakeslower

Problem

I have a table of about 700k rows in Postgres 9.3.3, which has the following structure:

Columns:
 content_body  - text                        
 publish_date  - timestamp without time zone 
 published     - boolean       

Indexes:
    "articles_pkey" PRIMARY KEY, btree (id)
    "article_text_gin" gin (article_text)
    "articles_publish_date_id_index" btree (publish_date DESC NULLS LAST, id DESC)


The query that I am making has a full text search query and a limit, as follows:

When I search for a string which is in my index with a limit and order in the query it is fast:

explain analyze select * from "articles" where article_text @@ plainto_tsquery('pg_catalog.simple', 'in_index') order by id limit 10;
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..1293.88 rows=10 width=1298) (actual time=2.073..9.837 rows=10 loops=1)
   ->  Index Scan using articles_pkey on articles  (cost=0.42..462150.49 rows=3573 width=1298) (actual time=2.055..9.711 rows=10 loops=1)
         Filter: (article_text @@ '''in_index'''::tsquery)
         Rows Removed by Filter: 611
 Total runtime: 9.952 ms


However if the string is not in the index it takes much longer:

```
explain analyze select * from "articles" where article_text @@ plainto_tsquery('pg_catalog.simple', 'not_in_index') order by id limit 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..1293.88 rows=10 width=1298) (actual time=5633.684..5633.684 rows=0 loops=1)
-> Index Scan using articles_pkey on article

Solution

The big difference between the first two queries is that int the first one, it could go along the index used by the primary key of the table (and used by the ORDER BY clause), then filter out the rows that don't match the WHERE condition. You can see that it had to visit about 621 rows (the 10 that got returned and 611 which were filtered) to get ready.

Now the second one used the same logic, but not having found a single match (not to mention 10), it had to go through the whole index and throw away all rows (Rows Removed by Filter: 796146).

The second pair, without ordering, chose a different plan, which in this case happened to be more effective for returning 0 rows :)

And the third pair, knowing it has to return lots of rows (it planned 3573 as opposed to 10), again went for a different plan, with a bitmap heap scan (not a bitmap index scan, as in the second pair). The time difference can be attributed mostly to this node:


Sort Method: external merge Disk: 12288kB

If you raised work_mem to a higher value (say 100 MB), this difference would mostly go away, I guess.

Context

StackExchange Database Administrators Q#60419, answer score: 5

Revisions (0)

No revisions yet.