HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Faster query with pattern-matching on multiple text fields

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
withfieldsquerytextfastermultiplepatternmatching

Problem

I have one Postgres table with more than 20M tuples:

first_name | last_name | email
-------------------------------------------
bat        | man       | batman@wayne.com
arya       | vidal     | foo@email.com
max        | joe       | bar@email.com


To filter the records I'm using:

SELECT *
  FROM people
WHERE (first_name || '' || last_name) ILIKE '%bat%man%' OR 
    first_name ILIKE '%bat%man%'  OR  
    last_name ILIKE '%bat%man%'   OR
    email ILIKE '%bat%man%'
    LIMIT 25 OFFSET 0


Even with indexes the search is taking almost one minute to return results.

There are indexes for (first_name || '' || last_name), first_name, last_name and email.

What can I do to improve the performance of this query?

Solution

For your kind of pattern matching you best use a trigram index. Read this first:

  • How is LIKE implemented?



I assume there's a typo in your expression (first_name || '' || last_name), which makes no sense with an empty string, and you really want (first_name || ' ' || last_name) - with a space character.

Assuming that either column can be NULL, you would need NULL-safe concatenation, the simple solution is concat_ws():

  • Combine two columns and add into one new column



But that function is not IMMUTABLE (explanation in the linked answer), so you cannot use it directly in an index expression. You could use an IMMUTABLE function wrapper:

CREATE OR REPLACE FUNCTION f_immutable_concat_ws(s text, t1 text, t2 text)
  RETURNS text
  LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
'SELECT concat_ws(s, t1, t2)';


The wrapper can be IMMUTABLE because it only takes text parameters. (But don't be tempted to coerce data types in the call for which text representation is not immutable!)

Either way, this is more verbose but has less internal overhead and is considerably faster:

CREATE OR REPLACE FUNCTION f_immutable_concat_ws(s text, t1 text, t2 text)
  RETURNS text
  LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
$func$
SELECT CASE
         WHEN t1 IS NULL THEN t2
         WHEN t2 IS NULL THEN t1
         ELSE t1 || s || t2
       END
$func$;


Or, with hard-coded space character:

CREATE OR REPLACE FUNCTION f_concat_space(t1 text, t2 text)
  RETURNS text AS
  LANGUAGE sql IMMUTABLE PARALLEL SAFE
$func$
SELECT CASE
         WHEN t1 IS NULL THEN t2
         WHEN t2 IS NULL THEN t1
         ELSE t1 || ' ' || t2
       END
$func$;


Make it PARALLEL SAFE (in Postgres 9.6 or later) to not stand in the way of parallelism. (And because it qualifies!)

Basing the index on the last function, I suggest:

CREATE INDEX people_gin_trgm_idx ON people
USING gin (f_concat_space(first_name, last_name) gin_trgm_ops, email gin_trgm_ops);


I added email as second index column because you seem to be checking both at the same time.

Creating the index will take a while for 20M rows, best not during top load, or maybe use CREATE INDEX CONCURRENTLY .... A GIN index can be considerably bigger than a plain B-tree index and is more expensive to maintain. Be sure to run the latest version of Postgres, there have been major improvements for GIN indexes in recent versions.

Then your slightly adapted and simplified query should be fast and correct:

SELECT *
FROM   people
WHERE  f_concat_space(first_name, last_name) ILIKE '%bat%man%'
    OR email ILIKE '%bat%man%'
LIMIT  25;


You only need the one index for this query.

Related:

  • Pattern matching with LIKE, SIMILAR TO or regular expressions



  • How to create an index to speed up an aggregate LIKE query on an expression?



  • How to turn JSON array into Postgres array?

Code Snippets

CREATE OR REPLACE FUNCTION f_immutable_concat_ws(s text, t1 text, t2 text)
  RETURNS text
  LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
'SELECT concat_ws(s, t1, t2)';
CREATE OR REPLACE FUNCTION f_immutable_concat_ws(s text, t1 text, t2 text)
  RETURNS text
  LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
$func$
SELECT CASE
         WHEN t1 IS NULL THEN t2
         WHEN t2 IS NULL THEN t1
         ELSE t1 || s || t2
       END
$func$;
CREATE OR REPLACE FUNCTION f_concat_space(t1 text, t2 text)
  RETURNS text AS
  LANGUAGE sql IMMUTABLE PARALLEL SAFE
$func$
SELECT CASE
         WHEN t1 IS NULL THEN t2
         WHEN t2 IS NULL THEN t1
         ELSE t1 || ' ' || t2
       END
$func$;
CREATE INDEX people_gin_trgm_idx ON people
USING gin (f_concat_space(first_name, last_name) gin_trgm_ops, email gin_trgm_ops);
SELECT *
FROM   people
WHERE  f_concat_space(first_name, last_name) ILIKE '%bat%man%'
    OR email ILIKE '%bat%man%'
LIMIT  25;

Context

StackExchange Database Administrators Q#117403, answer score: 18

Revisions (0)

No revisions yet.