patternsqlMinor
PostGIS - indexing - speed up query
Viewed 0 times
postgisindexingqueryspeed
Problem
I have this query (called from PHP, hence the
There is
It runs about 40ms for
Why? Can it be speed up or perform in a more stable manner?
$ variables):SELECT a.name,
St_AsText(a.geom),
b.name AS name_b,
b.id
FROM a
LEFT JOIN b ON b.id = a.b_id
WHERE UNACCENT(LOWER(a.name)) LIKE UNACCENT('$searched%')
ORDER BY a.geom 'SRID=4326;POINT($lat $lon)'::geometry
LIMIT 2;There is
lower(a.name) BTree index on a.name, GIST index on a.geom and b.id is primary keyIt runs about 40ms for
$searched = "Maria%". Now, if I set $searched = "Mariasomething some%", time goes up to 2s.Why? Can it be speed up or perform in a more stable manner?
Solution
Explanation
We would have to see table definition, cardinalities and the
Only your spatial GiST index on
Normally, more restrictive conditions make queries faster, since fewer rows have to be fetched (and sorted). But not in this constellation:
Fewer qualifying rows don't help a nearest neighbour search with a small
-
The "nearest neighbour" search based on the GiST index - which is fast as long as some of the closest rows qualify - fastest if you remove the
-
A sequential scan, which has to solve the same problem for selective conditions: filter many rows until finding some candidates. The method is faster, but the whole table has to be read, and if more than a few rows are found, the final sort by distance gets expensive.
Basically, Postgres is stuck between a rock and a hard place with this wicked query.
Fix
You would need an index on
The best index still seems tricky and depends on your complete situation.
Since you match left-anchored patterns, I would try a separate btree index with the
More:
Then:
I use dollar-quoting to avoid escaping single quotes.
For repeated calls use
Or you could try a GiST index - a multicolumn, functional GiST index combining a PostGis spatial GiST index and a functional trigram GiST index. I didn't test that combination, yet.
You need to install pg_trgm first. Instructions:
I put
A multicolumn GiST index can be used with query conditions that
involve any subset of the index's columns. Conditions on additional
columns restrict the entries returned by the index, but the condition
on the first column is the most important one for determining how much
of the index needs to be scanned. A GiST index will be relatively
ineffective if its first column has only a few distinct values, even
if there are many distinct values in additional columns.
We would have to see table definition, cardinalities and the
EXPLAIN output to be certain, but the reason is most likely this:Only your spatial GiST index on
a.geom can be used. The btree index is not applicable. Postgres walks through the "closest" rows until it finds the first two matching your predicate.Normally, more restrictive conditions make queries faster, since fewer rows have to be fetched (and sorted). But not in this constellation:
Fewer qualifying rows don't help a nearest neighbour search with a small
LIMIT - au contrair. There are two options:-
The "nearest neighbour" search based on the GiST index - which is fast as long as some of the closest rows qualify - fastest if you remove the
WHERE clause altogether, try it! It gets increasingly expensive if many rows have to be filtered. Basically: the longer your pattern the more expensive the query.-
A sequential scan, which has to solve the same problem for selective conditions: filter many rows until finding some candidates. The method is faster, but the whole table has to be read, and if more than a few rows are found, the final sort by distance gets expensive.
Basically, Postgres is stuck between a rock and a hard place with this wicked query.
Fix
You would need an index on
unaccent(lower(name)) which is not possible, because unaccent() is only STABLE, not IMMUTABLE. But you can work around this limitation with an IMMUTABLE function wrapper - done right:- Does PostgreSQL support “accent insensitive” collations?
The best index still seems tricky and depends on your complete situation.
Since you match left-anchored patterns, I would try a separate btree index with the
text_pattern_ops operator class, using the function f_unaccent() outlined in the linked answer above:CREATE INDEX a_name_pattern_idx ON a (f_unaccent(lower(name)) text_pattern_ops);More:
- Pattern matching with LIKE, SIMILAR TO or regular expressions
Then:
- Rewrite the
WHEREcondition to match the indexed expression
- Make the wildcard to the right of the expression explicit to make it work for prepared statements.
- Don't concatenation text parameters into your query to begin with, that's an invitation for sneaky errors and SQL injection. Use a prepared statement:
$sql = 'SELECT a.name, ST_AsText(a.geom), b.name AS name_b, b.id
FROM a
LEFT JOIN b ON b.id = a.b_id
WHERE f_unaccent(lower(a.name)) LIKE (f_unaccent(lower($1)) || $$%$$)
ORDER BY a.geom ST_GeomFromText($$POINT($2 $3)$$, 4326)
LIMIT 2';
$result = pg_query_params($sql, array($searched, $lat, $lon));
I use dollar-quoting to avoid escaping single quotes.
For repeated calls use
pg_prepare and pg_execute. Details:- Problems with my attempt to implement an UPSERT
Or you could try a GiST index - a multicolumn, functional GiST index combining a PostGis spatial GiST index and a functional trigram GiST index. I didn't test that combination, yet.
You need to install pg_trgm first. Instructions:
- How is LIKE implemented?
CREATE INDEX a_special_gist_idx ON a
USING gist (f_unaccent(lower(name)) gist_trgm_ops, geom);I put
f_unaccent(lower(name)) first because, quoting the manual:A multicolumn GiST index can be used with query conditions that
involve any subset of the index's columns. Conditions on additional
columns restrict the entries returned by the index, but the condition
on the first column is the most important one for determining how much
of the index needs to be scanned. A GiST index will be relatively
ineffective if its first column has only a few distinct values, even
if there are many distinct values in additional columns.
Code Snippets
CREATE INDEX a_name_pattern_idx ON a (f_unaccent(lower(name)) text_pattern_ops);CREATE INDEX a_special_gist_idx ON a
USING gist (f_unaccent(lower(name)) gist_trgm_ops, geom);Context
StackExchange Database Administrators Q#125789, answer score: 3
Revisions (0)
No revisions yet.