debugsqlMinor
Trigram index for ILIKE patterns not working as expected
Viewed 0 times
workingpatternsexpectedilikeforindextrigramnot
Problem
I've got a simple but slow query:
Explain analyze:
Then, I created a trigram index:
Explain analyze after adding the index: (Yes, I
```
Limit (cost=389.91..389.91 rows=1 width=20) (actual time=3720.511..3720.511 rows=0 loops=1)
-> Unique (cost=389.91..389.91 rows=1 width=20) (actual time=3720.507..3720.507 rows=0 loops=1)
-> Sort (cost=389.91..389.91 rows=1 width=20) (actual time=3720.505..3720.505 rows=0 loops=1)
Sort Key: "title"
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on "ja_jobs" (cost=385.88..389.90 rows=1 width=20) (actual time=3720.497..3720.497 rows=0 loops=1)
Recheck Cond: (("clientid" = 31239) AND ("time_job" > 145782626
SELECT DISTINCT title
FROM ja_jobs
WHERE title ILIKE '%RYAN WER%'
AND clientid = 31239
AND time_job > 1457826264
ORDER BY title
LIMIT 10;Explain analyze:
Limit (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.759..2746.772 rows=1 loops=1)
-> Unique (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.753..2746.763 rows=1 loops=1)
-> Sort (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.750..2746.754 rows=4 loops=1)
Sort Key: "title"
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on "ja_jobs" (cost=49.02..5946.39 rows=1 width=19) (actual time=576.275..2746.609 rows=4 loops=1)
Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
Filter: (("title")::"text" ~~* '%RYAN WER%'::"text")
Rows Removed by Filter: 791
-> Bitmap Index Scan on "ix_jobs_client_times" (cost=0.00..49.02 rows=1546 width=0) (actual time=100.870..100.870 rows=795 loops=1)
Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
Total runtime: 2746.879 msThen, I created a trigram index:
CREATE INDEX ix_ja_jobs_trgm_gin ON public.ja_jobs USING gin (title gin_trgm_ops);Explain analyze after adding the index: (Yes, I
analyze)```
Limit (cost=389.91..389.91 rows=1 width=20) (actual time=3720.511..3720.511 rows=0 loops=1)
-> Unique (cost=389.91..389.91 rows=1 width=20) (actual time=3720.507..3720.507 rows=0 loops=1)
-> Sort (cost=389.91..389.91 rows=1 width=20) (actual time=3720.505..3720.505 rows=0 loops=1)
Sort Key: "title"
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on "ja_jobs" (cost=385.88..389.90 rows=1 width=20) (actual time=3720.497..3720.497 rows=0 loops=1)
Recheck Cond: (("clientid" = 31239) AND ("time_job" > 145782626
Solution
You have a lot of indexes. I doubt you need all of them. Check whether all of them are in use. Instructions in the manual, chapter Examining Index Usage.
If your system is configured to gather statistics, it will be particularly revealing to study:
These statistics are also displayed in pgAdmin.
Some indexes are particularly odd, like:
Useless indexes are not going to impair read performance (much), but they are a burden on write performance and general maintenance.
The major difficulty for your query is to estimate the selectivity of your various predicates. That's comparatively simple for the
In your case (UPDATE 3), Postgres estimates to find 482 rows matching the pattern, but it turns out to be just a single row:
Bitmap Index Scan on "ix_jobs_trgm_gin" (cost=0.00..391.62 rows=482 width=0) (actual time=3492.427..3492.427 rows=1 loops=1)
Tuning the query depends on the complete picture: Cardinalities, data distribution, hardware, load, concurrency, query frequencies, priorities ...
It may help to increase the statistics target for involved columns:
Then:
Estimating selectivity of free-floating
There have been improvements for the latest version, but the biggest improvements come with Postgres 9.6 and its new version of the pg_trgm module. Consider the release notes here. Related:
There are various other ways to improve performance, depending on all the information that's not in your question. Maybe a multicolumn index, or a partial index or a CTE in your query.
All the general advice for performance tuning applies as well:
-
Configuring PostgreSQL for read performance
-
http://wiki.postgresql.org/wiki/Performance_Optimization
You already removed the expensive
If you actually only deal with left-anchored
So:
Index columns in this order. Equality first, range later. Explanation:
You can extend this solution to cover
And adapt your query:
If your system is configured to gather statistics, it will be particularly revealing to study:
SELECT * FROM pg_stat_user_indexesThese statistics are also displayed in pgAdmin.
Some indexes are particularly odd, like:
"ix_ja_jobs_gsdi_pk" "btree" (("id"::"text")) - why would anyone cast a bigint id to text for indexing?Useless indexes are not going to impair read performance (much), but they are a burden on write performance and general maintenance.
The major difficulty for your query is to estimate the selectivity of your various predicates. That's comparatively simple for the
bigint columns clientid and time_job, but hard for pattern matching (title ILIKE 'Hislop 13035%').In your case (UPDATE 3), Postgres estimates to find 482 rows matching the pattern, but it turns out to be just a single row:
Bitmap Index Scan on "ix_jobs_trgm_gin" (cost=0.00..391.62 rows=482 width=0) (actual time=3492.427..3492.427 rows=1 loops=1)
Tuning the query depends on the complete picture: Cardinalities, data distribution, hardware, load, concurrency, query frequencies, priorities ...
It may help to increase the statistics target for involved columns:
ALTER ja_jobs
ALTER clientid SET STATISTICS 1000
, ALTER time_job SET STATISTICS 1000
, ALTER title SET STATISTICS 1000;Then:
ANALYZE ja_jobs; But don't expect much. Details:- Check statistics targets in PostgreSQL
Estimating selectivity of free-floating
LIKE patterns is hard. Left-anchored is easier - you are mixing these two rather different cases freely: ILIKE '%RYAN WER%' (with leading wildcard) is much more complex than ILIKE 'Hislop 13035%'. Overview:- Pattern matching with LIKE, SIMILAR TO or regular expressions
There have been improvements for the latest version, but the biggest improvements come with Postgres 9.6 and its new version of the pg_trgm module. Consider the release notes here. Related:
- Trigram search gets much slower as search string gets longer
There are various other ways to improve performance, depending on all the information that's not in your question. Maybe a multicolumn index, or a partial index or a CTE in your query.
All the general advice for performance tuning applies as well:
-
Configuring PostgreSQL for read performance
-
http://wiki.postgresql.org/wiki/Performance_Optimization
You already removed the expensive
DISTINCT. If you don't need ORDER BY title, it may change the query plan completely to drop that, too: With both of these removed, Postgres is free to pick the first 10 matches and ignore the rest. Else, all matches have to be found and considered. can be much more expensive. Try:SELECT title
FROM ja_jobs
WHERE title ILIKE 'Hislop 13035%'
AND clientid = 2565
AND time_job > 1382496599
LIMIT 10; -- no ORDER BYIf you actually only deal with left-anchored
LIKE pattern (trailing wildcards like 'Hislop 13035%', but not: '%RYAN WER%'), then you can use a very fast varchar_pattern_ops index. Detailed explanation:- Operator "~
So:
CREATE INDEX ix_ja_jobs_special_idx ON public.ja_jobs
(clientid, title varchar_pattern_ops, time_job);Index columns in this order. Equality first, range later. Explanation:
- Multicolumn index and performance
You can extend this solution to cover
ILIKE with a functional elementCREATE INDEX ix_ja_jobs_special_idx ON public.ja_jobs
(clientid, lower(title) varchar_pattern_ops, time_job);And adapt your query:
SELECT title
FROM ja_jobs
WHERE lower(title) LIKE lower('Hislop 13035%')
AND clientid = 2565
AND time_job > 1382496599
LIMIT 10;- Index on column with data type citext not used
Code Snippets
SELECT * FROM pg_stat_user_indexesALTER ja_jobs
ALTER clientid SET STATISTICS 1000
, ALTER time_job SET STATISTICS 1000
, ALTER title SET STATISTICS 1000;SELECT title
FROM ja_jobs
WHERE title ILIKE 'Hislop 13035%'
AND clientid = 2565
AND time_job > 1382496599
LIMIT 10; -- no ORDER BYCREATE INDEX ix_ja_jobs_special_idx ON public.ja_jobs
(clientid, title varchar_pattern_ops, time_job);CREATE INDEX ix_ja_jobs_special_idx ON public.ja_jobs
(clientid, lower(title) varchar_pattern_ops, time_job);Context
StackExchange Database Administrators Q#138288, answer score: 3
Revisions (0)
No revisions yet.