patternsqlMinor
Using ILIKE with unaccent and with only right end wildcard
Viewed 0 times
wildcardwithilikeusingendandunaccentonlyright
Problem
I use Postgresql 9.4 and I have a big table named foo. I want to search on it but I get long execution times if the search text is very short (e.g. "v") or long (e.g. "This is a search example with gin on table foo%"). In this cases my index is ignored. Here my search query:
This is my index:
Why is the index ignored and uses seq scan and/or Bitmap heap scan? How can I add other index to solve this problem?
Why it does a recheck?
Function
Query Plan:
```
Limit (cost=24412.85..67568.91 rows=100 width=301) (actual time=21838.473..21838.473 rows=0 loops=1)
Buffers: shared hit=1 read=749976
-> Bitmap Heap Scan on foo (cost=24412.85..4595502.73 rows=10592 width=301) (actual time=21838.470..21838.470 rows=0 loops=1)
Recheck Cond: ((f_unaccent((name)::text) ~~* 'v%'::text) AND ((locale)::text = 'de'::text))
Rows Removed by Index Recheck: 5416739
Filter: ((configuration -> 'bar'::text) @> '{"is": ["a"]}'::jsonb)
Rows Removed by Filter: 2196
Heap Blocks: exact=749172
Buffers: shared hit=1 read=749976
-> Bitmap Index Scan on index_foo_on_name_de_gin (cost=0.00..24410.20 rows=10591544 width=0) (actual time=641.532..641.532 rows=5418935 loops=1)
Index Cond: (f_unaccent((name)::text) ~~* 'v%'::text)
Buffers: shared hit=1 read=804
Planning time:
EXPLAIN (ANALYZE, TIMING)
SELECT "foo".* FROM "foo" WHERE "foo"."locale" = 'de'
AND f_unaccent(foo.name) ILIKE f_unaccent('v%')
AND foo.configuration->'bar' @> '{"is":["a"]}'
LIMIT 100;This is my index:
CREATE INDEX index_foo_on_name_de_gin ON foo USING gin(f_unaccent(name) gin_trgm_ops) WHERE locale = 'de';Why is the index ignored and uses seq scan and/or Bitmap heap scan? How can I add other index to solve this problem?
Why it does a recheck?
Recheck Cond: ((f_unaccent((name)::text) ~~* 'v%'::text) AND ((locale)::text = 'de'::text))Function
f_unaccent:CREATE OR REPLACE FUNCTION f_unaccent(text)
RETURNS text AS
$func$
SELECT unaccent('unaccent', $1)
$func$ LANGUAGE sql IMMUTABLE SET search_path = public, pg_temp;Query Plan:
```
Limit (cost=24412.85..67568.91 rows=100 width=301) (actual time=21838.473..21838.473 rows=0 loops=1)
Buffers: shared hit=1 read=749976
-> Bitmap Heap Scan on foo (cost=24412.85..4595502.73 rows=10592 width=301) (actual time=21838.470..21838.470 rows=0 loops=1)
Recheck Cond: ((f_unaccent((name)::text) ~~* 'v%'::text) AND ((locale)::text = 'de'::text))
Rows Removed by Index Recheck: 5416739
Filter: ((configuration -> 'bar'::text) @> '{"is": ["a"]}'::jsonb)
Rows Removed by Filter: 2196
Heap Blocks: exact=749172
Buffers: shared hit=1 read=749976
-> Bitmap Index Scan on index_foo_on_name_de_gin (cost=0.00..24410.20 rows=10591544 width=0) (actual time=641.532..641.532 rows=5418935 loops=1)
Index Cond: (f_unaccent((name)::text) ~~* 'v%'::text)
Buffers: shared hit=1 read=804
Planning time:
Solution
f_unaccent()
Seems like you are using my function as defined here:
- Does PostgreSQL support “accent insensitive” collations?
Note the update I just made. This is better:
CREATE OR REPLACE FUNCTION f_unaccent(text)
RETURNS text
LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
$func$
SELECT public.unaccent('public.unaccent', $1) -- schema-qualify function and dictionary
$func$;Detailed explanation over there.
- Recheck
Why it does a recheck?
The "Recheck Cond:" line is always in the
EXPLAIN output for bitmap index scans. Not to worry. Detailed explanation:- "Recheck Cond:" line in query plans with a bitmap index scan
- Index and query plan
Why is the index ignored
That's a misunderstanding. Your index is obviously not ignored. If Postgres expects to find enough rows so that some data pages in the main relation would have to be visited more than once (obviously the case with
rows=10591544), it switches from index scan to bitmap index scan - which is followed by a "Bitmap Heap Scan" to fetch actual tuples. Details:What makes this query really expensive is a combination of multiple unfortunate factors:
-
Neither index (Buffers: shared hit=1 read=804) nor table (
Buffers: shared hit=1 read=749976) were cached. If you repeat that query right away, it will be much faster, since all of it is cached by then. This is the worst case possible-
The search pattern
f_unaccent('v%') - or just 'v%' is a very bad case for a trigram index. Not very selective - but still selective enough to use it instead of an actual sequential scan. A text_pattern_ops index would be much faster for this. See below.More selective patterns (longer string) would also be much faster.
-
You had
LIMIT 100, so Postgres started out optimistically hoping to find 100 rows quickly. But the query returns with 0 rows (rows=0). This means that Postgres had to walk through all candidate rows unsuccessfully. Another worst case scenario. Your second predicate is to blame here:AND foo.configuration->'bar' @> '{"is":["a"]}'Postgres has only very limited statistics for
jsonb columns. It has no idea how selective that condition is going to be. If you have many queries on configuration->'bar', you could improve the situation drastically with another expression index ...- Index for finding an element in a JSON array
Possibly even a multicolumn index.
text_pattern_ops
For just left-anchored patterns ("right end wildcard"), you can make do without trigram indexes. But a plain btree index won't do, if you are using any locale in your DB other than the "C" locale (which is effectively "no locale"). Else you need special operator classes to ignore the locale. Like:
CREATE INDEX index_foo_name_pattern_ops_de ON foo (f_unaccent(name) text_pattern_ops)
WHERE locale = 'de';Details:
- Pattern matching with LIKE, SIMILAR TO or regular expressions
Code Snippets
CREATE OR REPLACE FUNCTION f_unaccent(text)
RETURNS text
LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
$func$
SELECT public.unaccent('public.unaccent', $1) -- schema-qualify function and dictionary
$func$;AND foo.configuration->'bar' @> '{"is":["a"]}'CREATE INDEX index_foo_name_pattern_ops_de ON foo (f_unaccent(name) text_pattern_ops)
WHERE locale = 'de';Context
StackExchange Database Administrators Q#129202, answer score: 5
Revisions (0)
No revisions yet.