patternsqlpostgresqlModerate
pg_trgm trigram indexes for fast fuzzy and LIKE/ILIKE search
Viewed 0 times
pg_trgmtrigram indexfuzzy searchILIKE indexsimilaritynearest neighborautocomplete
Problem
LIKE '%keyword%' and ILIKE '%keyword%' with a leading wildcard force a sequential scan; standard B-tree indexes cannot be used for infix pattern matching.
Solution
Use the pg_trgm extension with a GIN or GiST index:
CREATE EXTENSION IF NOT EXISTS pg_trgm;
-- GIN is faster for lookups; GiST is faster to build and supports similarity ORDER BY:
CREATE INDEX idx_products_name_trgm
ON products USING gin(name gin_trgm_ops);
-- Now LIKE, ILIKE, and ~ queries use the index:
SELECT * FROM products WHERE name ILIKE '%wireless%';
-- Similarity search (fuzzy matching):
SELECT name, similarity(name, 'wireles') AS sim
FROM products
WHERE similarity(name, 'wireles') > 0.3
ORDER BY sim DESC
LIMIT 10;
-- Nearest-neighbor search (GiST only):
SELECT name FROM products
ORDER BY name <-> 'wireles'
LIMIT 10;
CREATE EXTENSION IF NOT EXISTS pg_trgm;
-- GIN is faster for lookups; GiST is faster to build and supports similarity ORDER BY:
CREATE INDEX idx_products_name_trgm
ON products USING gin(name gin_trgm_ops);
-- Now LIKE, ILIKE, and ~ queries use the index:
SELECT * FROM products WHERE name ILIKE '%wireless%';
-- Similarity search (fuzzy matching):
SELECT name, similarity(name, 'wireles') AS sim
FROM products
WHERE similarity(name, 'wireles') > 0.3
ORDER BY sim DESC
LIMIT 10;
-- Nearest-neighbor search (GiST only):
SELECT name FROM products
ORDER BY name <-> 'wireles'
LIMIT 10;
Why
Trigrams are all 3-character substrings of a string. GIN/GiST indexes on trigrams let the planner look up the specific trigrams present in the search pattern and intersect their posting lists, avoiding a full scan.
Gotchas
- Short search patterns (< 3 characters) cannot use trigram indexes; they fall back to seq scan
- GIN trigram indexes are large (roughly 2-4x the column size); monitor with pg_relation_size()
- pg_trgm.similarity_threshold GUC controls the % operator threshold (default 0.3)
- For full-text search, tsvector/GIN is more precise; pg_trgm is better for autocomplete and typo tolerance
Code Snippets
Inspect trigrams generated from a string
-- Check trigram decomposition:
SELECT show_trgm('wireless');
-- {" w"," wi","ess","ire","les","rel","wir"}Context
Search boxes, autocomplete, and typo-tolerant lookups on name or description columns
Revisions (0)
No revisions yet.