patternsqlMinor
Index on expression (regular expression pattern matching) is not used in query
Viewed 0 times
expressionusedqueryregularindexnotpatternmatching
Problem
In my PostgreSQL 12.8 database, I have a relatively simple table
I would like to query for all rows with a value that is formatted as an email address.
The query looks like this:
As there are a several million rows in that table, I try to speed up this query by adding a matching expression index with
Unfortunately, the query planner does not utilize the index and performs a full scan on the table instead, which is very slow.
Can anybody help me fixing the index or tell me what other options I have?
I already considered a generated boolean column
the_table with a column value with type varchar:CREATE TABLE public.the_table (
id uuid DEFAULT gen_random_uuid() NOT NULL,
label character varying,
value character varying,
created_at timestamp without time zone NOT NULL,
updated_at timestamp without time zone NOT NULL,
);I would like to query for all rows with a value that is formatted as an email address.
The query looks like this:
SELECT FROM the_table WHERE value ~ '^[a-zA-Z0-9.$%&+/=?^_{|}~-]+@[a-zA-Z0-9.-]+\.[a-zA-Z0-9]+$'.As there are a several million rows in that table, I try to speed up this query by adding a matching expression index with
CREATE INDEX index_the_table_on_email_values ON the_table ((value ~ '^[a-zA-Z0-9.$%&*+/=?^_{|}~-]+@[a-zA-Z0-9.-]+\.[a-zA-Z0-9]+$'));Unfortunately, the query planner does not utilize the index and performs a full scan on the table instead, which is very slow.
Can anybody help me fixing the index or tell me what other options I have?
I already considered a generated boolean column
is_email instead. I could add an index to that generated column and query it directly. But this seems like a weird workaround for the original problem, which should be solvable with a matching index, correct?Solution
To be fair, your index on a
The point is this: if there is a large percentage of "email" rows, no index is going to help (much) - except for special cases. Postgres will typically chose a faster sequential scan instead. (I suspect that's your case.)
And if there are only few "email" rows, a partial index instead is much more efficient as it excludes most rows to begin with:
Of course, the idea with a generated column
boolean expression basically works, too.The point is this: if there is a large percentage of "email" rows, no index is going to help (much) - except for special cases. Postgres will typically chose a faster sequential scan instead. (I suspect that's your case.)
And if there are only few "email" rows, a partial index instead is much more efficient as it excludes most rows to begin with:
CREATE INDEX the_table_email_idx ON the_table ((true))
WHERE value ~ '^[a-zA-Z0-9.$%&*+/=?^_{|}~-]+@[a-zA-Z0-9.-]+\.[a-zA-Z0-9]+
(true) is just an arbitrary constant, since there is no obvious index column. Typically, you have a useful index column on top of the "email filter" that can replace that constant - to make the index even more useful. Related:
- Postgres partial index on IS NULL not working
Of course, the idea with a generated column is_email isn't that bad either. You would then create a partial index with the condition on that generated column instead. There are pros and cons to this.;(true) is just an arbitrary constant, since there is no obvious index column. Typically, you have a useful index column on top of the "email filter" that can replace that constant - to make the index even more useful. Related:- Postgres partial index on IS NULL not working
Of course, the idea with a generated column
is_email isn't that bad either. You would then create a partial index with the condition on that generated column instead. There are pros and cons to this.Code Snippets
CREATE INDEX the_table_email_idx ON the_table ((true))
WHERE value ~ '^[a-zA-Z0-9.$%&*+/=?^_{|}~-]+@[a-zA-Z0-9.-]+\.[a-zA-Z0-9]+$';Context
StackExchange Database Administrators Q#316765, answer score: 6
Revisions (0)
No revisions yet.