HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

FTS is slow when searched word appears frequently in rows

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
rowsftssearchedslowfrequentlywordappearswhen

Problem

I have a message table with a content column indexed for full text search:

create index message_fts on message using GIN(to_tsvector('english', content));


The problem I have and that I'd like to understand is that searches involving frequent words are very slow.

Here's a normal fast enough query on a rare "word":

miaou=> explain analyze select count(*) from message where (to_tsvector('english', content) @@ to_tsquery('fajre'));
                               QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=16808.35..16808.36 rows=1 width=8) (actual time=6.565..6.565 rows=1 loops=1)
   ->  Gather  (cost=16808.13..16808.34 rows=2 width=8) (actual time=1.608..6.556 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=15808.13..15808.14 rows=1 width=8) (actual time=0.456..0.457 rows=1 loops=3)
               ->  Parallel Bitmap Heap Scan on message  (cost=293.28..15803.69 rows=1776 width=0) (actual time=0.453..0.453 rows=0 loops=3)
                     Recheck Cond: (to_tsvector('english'::regconfig, content) @@ to_tsquery('fajre'::text))
                     ->  Bitmap Index Scan on message_fts  (cost=0.00..292.22 rows=4262 width=0) (actual time=0.954..0.954 rows=0 loops=1)
                           Index Cond: (to_tsvector('english'::regconfig, content) @@ to_tsquery('fajre'::text))
 Planning time: 0.249 ms
 Execution time: 8.400 ms
(11 rows)


And here's a very slow query on a frequent term:

```
miaou=> explain analyze select count(*) from message where (to_tsvector('english', content) @@ to_tsquery('faire'));
NOTICE: word is too long to be indexed
DETAIL: Words longer than 2047 characters are ignored.

QUERY PLAN

Solution

Your bitmap has grown too large to fit in work_mem, and so it has "gone lossy" where it records just the blocks that contains matches (for some blocks), and then needs to recheck each row in those blocks.

Rows Removed by Index Recheck: 1329552
                 Heap Blocks: exact=12673 lossy=22516


Doing this is slow. Especially so because you are using a functional index, and so it has to re-parse content for each of those rows as the parsed form is not stored in the table. (That re-parsing is also why you get the NOTICEs).

Increase work_mem so that it can hold the entire bitmap.

Code Snippets

Rows Removed by Index Recheck: 1329552
                 Heap Blocks: exact=12673 lossy=22516

Context

StackExchange Database Administrators Q#209121, answer score: 3

Revisions (0)

No revisions yet.