patternsqlModerate
Faster query with pattern-matching on multiple text fields
Viewed 0 times
withfieldsquerytextfastermultiplepatternmatching
Problem
I have one Postgres table with more than 20M tuples:
To filter the records I'm using:
Even with indexes the search is taking almost one minute to return results.
There are indexes for
What can I do to improve the performance of this query?
first_name | last_name | email
-------------------------------------------
bat | man | batman@wayne.com
arya | vidal | foo@email.com
max | joe | bar@email.comTo filter the records I'm using:
SELECT *
FROM people
WHERE (first_name || '' || last_name) ILIKE '%bat%man%' OR
first_name ILIKE '%bat%man%' OR
last_name ILIKE '%bat%man%' OR
email ILIKE '%bat%man%'
LIMIT 25 OFFSET 0Even with indexes the search is taking almost one minute to return results.
There are indexes for
(first_name || '' || last_name), first_name, last_name and email.What can I do to improve the performance of this query?
Solution
For your kind of pattern matching you best use a trigram index. Read this first:
I assume there's a typo in your expression
Assuming that either column can be NULL, you would need NULL-safe concatenation, the simple solution is
But that function is not
The wrapper can be
Either way, this is more verbose but has less internal overhead and is considerably faster:
Or, with hard-coded space character:
Make it
Basing the index on the last function, I suggest:
I added
Creating the index will take a while for 20M rows, best not during top load, or maybe use
Then your slightly adapted and simplified query should be fast and correct:
You only need the one index for this query.
Related:
- How is LIKE implemented?
I assume there's a typo in your expression
(first_name || '' || last_name), which makes no sense with an empty string, and you really want (first_name || ' ' || last_name) - with a space character.Assuming that either column can be NULL, you would need NULL-safe concatenation, the simple solution is
concat_ws():- Combine two columns and add into one new column
But that function is not
IMMUTABLE (explanation in the linked answer), so you cannot use it directly in an index expression. You could use an IMMUTABLE function wrapper:CREATE OR REPLACE FUNCTION f_immutable_concat_ws(s text, t1 text, t2 text)
RETURNS text
LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
'SELECT concat_ws(s, t1, t2)';The wrapper can be
IMMUTABLE because it only takes text parameters. (But don't be tempted to coerce data types in the call for which text representation is not immutable!)Either way, this is more verbose but has less internal overhead and is considerably faster:
CREATE OR REPLACE FUNCTION f_immutable_concat_ws(s text, t1 text, t2 text)
RETURNS text
LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
$func$
SELECT CASE
WHEN t1 IS NULL THEN t2
WHEN t2 IS NULL THEN t1
ELSE t1 || s || t2
END
$func$;Or, with hard-coded space character:
CREATE OR REPLACE FUNCTION f_concat_space(t1 text, t2 text)
RETURNS text AS
LANGUAGE sql IMMUTABLE PARALLEL SAFE
$func$
SELECT CASE
WHEN t1 IS NULL THEN t2
WHEN t2 IS NULL THEN t1
ELSE t1 || ' ' || t2
END
$func$;Make it
PARALLEL SAFE (in Postgres 9.6 or later) to not stand in the way of parallelism. (And because it qualifies!)Basing the index on the last function, I suggest:
CREATE INDEX people_gin_trgm_idx ON people
USING gin (f_concat_space(first_name, last_name) gin_trgm_ops, email gin_trgm_ops);I added
email as second index column because you seem to be checking both at the same time.Creating the index will take a while for 20M rows, best not during top load, or maybe use
CREATE INDEX CONCURRENTLY .... A GIN index can be considerably bigger than a plain B-tree index and is more expensive to maintain. Be sure to run the latest version of Postgres, there have been major improvements for GIN indexes in recent versions.Then your slightly adapted and simplified query should be fast and correct:
SELECT *
FROM people
WHERE f_concat_space(first_name, last_name) ILIKE '%bat%man%'
OR email ILIKE '%bat%man%'
LIMIT 25;You only need the one index for this query.
Related:
- Pattern matching with LIKE, SIMILAR TO or regular expressions
- How to create an index to speed up an aggregate LIKE query on an expression?
- How to turn JSON array into Postgres array?
Code Snippets
CREATE OR REPLACE FUNCTION f_immutable_concat_ws(s text, t1 text, t2 text)
RETURNS text
LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
'SELECT concat_ws(s, t1, t2)';CREATE OR REPLACE FUNCTION f_immutable_concat_ws(s text, t1 text, t2 text)
RETURNS text
LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
$func$
SELECT CASE
WHEN t1 IS NULL THEN t2
WHEN t2 IS NULL THEN t1
ELSE t1 || s || t2
END
$func$;CREATE OR REPLACE FUNCTION f_concat_space(t1 text, t2 text)
RETURNS text AS
LANGUAGE sql IMMUTABLE PARALLEL SAFE
$func$
SELECT CASE
WHEN t1 IS NULL THEN t2
WHEN t2 IS NULL THEN t1
ELSE t1 || ' ' || t2
END
$func$;CREATE INDEX people_gin_trgm_idx ON people
USING gin (f_concat_space(first_name, last_name) gin_trgm_ops, email gin_trgm_ops);SELECT *
FROM people
WHERE f_concat_space(first_name, last_name) ILIKE '%bat%man%'
OR email ILIKE '%bat%man%'
LIMIT 25;Context
StackExchange Database Administrators Q#117403, answer score: 18
Revisions (0)
No revisions yet.