HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Optimizing ORDER BY in a full text search query

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
fullordersearchtextqueryoptimizing

Problem

I have a big table entities with ~ 15M records. I want to find top 5 rows matching 'hockey' in their name.

I have a full text index on name, which is used: gin_ix_entity_full_text_search_name

Query:

SELECT "entities".*,
         ts_rank(to_tsvector('english', "entities"."name"::text),
         to_tsquery('english', 'hockey'::text)) AS "rank0.48661998202865475"
    FROM "entities" 
         WHERE "entities"."place" = 'f' 
              AND (to_tsvector('english', "entities"."name"::text) @@ to_tsquery('english', 'hockey'::text)) 
         ORDER BY "rank0.48661998202865475" DESC LIMIT 5


Duration 25,623 ms

Explain plan
1 Limit (cost=12666.89..12666.89 rows=5 width=3116)
2 -> Sort (cost=12666.89..12670.18 rows=6571 width=3116)
3 Sort Key: (ts_rank(to_tsvector('english'::regconfig, (name)::text), '''hockey'''::tsquery))
4 -> Bitmap Heap Scan on entities (cost=124.06..12645.06 rows=6571 width=3116)
5 Recheck Cond: (to_tsvector('english'::regconfig, (name)::text) @@ '''hockey'''::tsquery)
6 Filter: (NOT place)
7 -> Bitmap Index Scan on gin_ix_entity_full_text_search_name (cost=0.00..123.74 rows=6625 width=0)
8 Index Cond: (to_tsvector('english'::regconfig, (name)::text) @@ '''hockey'''::tsquery)

I do not understand why it verifies the index condition twice. (Query plan step 4 and 7). Is it because of my boolean condition (not place)? If so, should I add it to my index to get a very fast query? Or does the sorting condition make it slow?

EXPLAIN ANALYZE output:

Limit (cost=4447.28..4447.29 rows=5 width=3116) (actual time=18509.274..18509.282 rows=5 loops=1)
-> Sort (cost=4447.28..4448.41 rows=2248 width=3116) (actual time=18509.271..18509.273 rows=5 loops=1)
Sort Key: (ts_rank(to_tsvector('english'::regconfig, (name)::text), '''test'''::tsquery))
Sort Method: top-N heapsort Memory: 19kB
-> Bitmap Heap Scan on entities (co

Solution

What I still don't understand, is why this is slower.

That sorting the rows will cost something is obvious. But why so much?

Without ORDER BY rank0... Postgres can just

  • pick the first 5 rows it finds and stop fetching rows as soon as it has 5.




Bitmap Heap Scan on entities ... rows=5 ...

  • then compute ts_rank() for just 5 rows.



In the second case, Postgres has to

  • fetch all (1495 according to your query plan) rows that qualify.




Bitmap Heap Scan on entities ... rows=1495 ...

  • compute ts_rank() for all of them.



  • sort all of them to find the first 5 according to the calculated value.



Try ORDER BY name just to see the cost of computing to_tsquery('english', 'hockey'::text)) for the superfluous rows and how much remains for fetching more rows and sorting.

Context

StackExchange Database Administrators Q#16437, answer score: 8

Revisions (0)

No revisions yet.