patternsqlMinor

Index on expression (regular expression pattern matching) is not used in query

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

expressionusedqueryregularindexnotpatternmatching

Problem

In my PostgreSQL 12.8 database, I have a relatively simple table the_table with a column value with type varchar:

CREATE TABLE public.the_table (
    id uuid DEFAULT gen_random_uuid() NOT NULL,
    label character varying,
    value character varying,
    created_at timestamp without time zone NOT NULL,
    updated_at timestamp without time zone NOT NULL,
);

I would like to query for all rows with a value that is formatted as an email address.
The query looks like this: SELECT FROM the_table WHERE value ~ '^[a-zA-Z0-9.$%&+/=?^_{|}~-]+@[a-zA-Z0-9.-]+\.[a-zA-Z0-9]+$'.

As there are a several million rows in that table, I try to speed up this query by adding a matching expression index with

CREATE INDEX index_the_table_on_email_values ON the_table ((value ~ '^[a-zA-Z0-9.$%&*+/=?^_{|}~-]+@[a-zA-Z0-9.-]+\.[a-zA-Z0-9]+$'));

Unfortunately, the query planner does not utilize the index and performs a full scan on the table instead, which is very slow.

Can anybody help me fixing the index or tell me what other options I have?
I already considered a generated boolean column is_email instead. I could add an index to that generated column and query it directly. But this seems like a weird workaround for the original problem, which should be solvable with a matching index, correct?

Solution

To be fair, your index on a boolean expression basically works, too.

The point is this: if there is a large percentage of "email" rows, no index is going to help (much) - except for special cases. Postgres will typically chose a faster sequential scan instead. (I suspect that's your case.)

And if there are only few "email" rows, a partial index instead is much more efficient as it excludes most rows to begin with:

CREATE INDEX the_table_email_idx ON the_table ((true))
WHERE value ~ '^[a-zA-Z0-9.$%&*+/=?^_{|}~-]+@[a-zA-Z0-9.-]+\.[a-zA-Z0-9]+

(true) is just an arbitrary constant, since there is no obvious index column. Typically, you have a useful index column on top of the "email filter" that can replace that constant - to make the index even more useful. Related:

Postgres partial index on IS NULL not working



Of course, the idea with a generated column is_email isn't that bad either. You would then create a partial index with the condition on that generated column instead. There are pros and cons to this.;

(true) is just an arbitrary constant, since there is no obvious index column. Typically, you have a useful index column on top of the "email filter" that can replace that constant - to make the index even more useful. Related:

Postgres partial index on IS NULL not working

Of course, the idea with a generated column is_email isn't that bad either. You would then create a partial index with the condition on that generated column instead. There are pros and cons to this.

Code Snippets

CREATE INDEX the_table_email_idx ON the_table ((true))
WHERE value ~ '^[a-zA-Z0-9.$%&*+/=?^_{|}~-]+@[a-zA-Z0-9.-]+\.[a-zA-Z0-9]+$';

Context

StackExchange Database Administrators Q#316765, answer score: 6

Revisions (0)

No revisions yet.