patternsqlMinor
Optimizing ORDER BY in a full text search query
Viewed 0 times
fullordersearchtextqueryoptimizing
Problem
I have a big table
I have a full text index on
Query:
Duration 25,623 ms
Explain plan
1 Limit (cost=12666.89..12666.89 rows=5 width=3116)
2 -> Sort (cost=12666.89..12670.18 rows=6571 width=3116)
3 Sort Key: (ts_rank(to_tsvector('english'::regconfig, (name)::text), '''hockey'''::tsquery))
4 -> Bitmap Heap Scan on entities (cost=124.06..12645.06 rows=6571 width=3116)
5 Recheck Cond: (to_tsvector('english'::regconfig, (name)::text) @@ '''hockey'''::tsquery)
6 Filter: (NOT place)
7 -> Bitmap Index Scan on gin_ix_entity_full_text_search_name (cost=0.00..123.74 rows=6625 width=0)
8 Index Cond: (to_tsvector('english'::regconfig, (name)::text) @@ '''hockey'''::tsquery)
I do not understand why it verifies the index condition twice. (Query plan step 4 and 7). Is it because of my boolean condition (
Limit (cost=4447.28..4447.29 rows=5 width=3116) (actual time=18509.274..18509.282 rows=5 loops=1)
-> Sort (cost=4447.28..4448.41 rows=2248 width=3116) (actual time=18509.271..18509.273 rows=5 loops=1)
Sort Key: (ts_rank(to_tsvector('english'::regconfig, (name)::text), '''test'''::tsquery))
Sort Method: top-N heapsort Memory: 19kB
-> Bitmap Heap Scan on entities (co
entities with ~ 15M records. I want to find top 5 rows matching 'hockey' in their name.I have a full text index on
name, which is used: gin_ix_entity_full_text_search_nameQuery:
SELECT "entities".*,
ts_rank(to_tsvector('english', "entities"."name"::text),
to_tsquery('english', 'hockey'::text)) AS "rank0.48661998202865475"
FROM "entities"
WHERE "entities"."place" = 'f'
AND (to_tsvector('english', "entities"."name"::text) @@ to_tsquery('english', 'hockey'::text))
ORDER BY "rank0.48661998202865475" DESC LIMIT 5Duration 25,623 ms
Explain plan
1 Limit (cost=12666.89..12666.89 rows=5 width=3116)
2 -> Sort (cost=12666.89..12670.18 rows=6571 width=3116)
3 Sort Key: (ts_rank(to_tsvector('english'::regconfig, (name)::text), '''hockey'''::tsquery))
4 -> Bitmap Heap Scan on entities (cost=124.06..12645.06 rows=6571 width=3116)
5 Recheck Cond: (to_tsvector('english'::regconfig, (name)::text) @@ '''hockey'''::tsquery)
6 Filter: (NOT place)
7 -> Bitmap Index Scan on gin_ix_entity_full_text_search_name (cost=0.00..123.74 rows=6625 width=0)
8 Index Cond: (to_tsvector('english'::regconfig, (name)::text) @@ '''hockey'''::tsquery)
I do not understand why it verifies the index condition twice. (Query plan step 4 and 7). Is it because of my boolean condition (
not place)? If so, should I add it to my index to get a very fast query? Or does the sorting condition make it slow?EXPLAIN ANALYZE output:Limit (cost=4447.28..4447.29 rows=5 width=3116) (actual time=18509.274..18509.282 rows=5 loops=1)
-> Sort (cost=4447.28..4448.41 rows=2248 width=3116) (actual time=18509.271..18509.273 rows=5 loops=1)
Sort Key: (ts_rank(to_tsvector('english'::regconfig, (name)::text), '''test'''::tsquery))
Sort Method: top-N heapsort Memory: 19kB
-> Bitmap Heap Scan on entities (co
Solution
What I still don't understand, is why this is slower.
That sorting the rows will cost something is obvious. But why so much?
Without
Bitmap Heap Scan on entities ... rows=5 ...
In the second case, Postgres has to
Bitmap Heap Scan on entities ... rows=1495 ...
Try
That sorting the rows will cost something is obvious. But why so much?
Without
ORDER BY rank0... Postgres can just- pick the first 5 rows it finds and stop fetching rows as soon as it has 5.
Bitmap Heap Scan on entities ... rows=5 ...
- then compute
ts_rank()for just 5 rows.
In the second case, Postgres has to
- fetch all (1495 according to your query plan) rows that qualify.
Bitmap Heap Scan on entities ... rows=1495 ...
- compute
ts_rank()for all of them.
- sort all of them to find the first 5 according to the calculated value.
Try
ORDER BY name just to see the cost of computing to_tsquery('english', 'hockey'::text)) for the superfluous rows and how much remains for fetching more rows and sorting.Context
StackExchange Database Administrators Q#16437, answer score: 8
Revisions (0)
No revisions yet.