HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

PostGIS - indexing - speed up query

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgisindexingqueryspeed

Problem

I have this query (called from PHP, hence the $ variables):

SELECT a.name,
      St_AsText(a.geom), 
      b.name AS name_b,
      b.id
FROM a
LEFT JOIN b ON b.id = a.b_id
WHERE UNACCENT(LOWER(a.name)) LIKE UNACCENT('$searched%')
ORDER BY a.geom  'SRID=4326;POINT($lat $lon)'::geometry         
LIMIT 2;


There is lower(a.name) BTree index on a.name, GIST index on a.geom and b.id is primary key

It runs about 40ms for $searched = "Maria%". Now, if I set $searched = "Mariasomething some%", time goes up to 2s.

Why? Can it be speed up or perform in a more stable manner?

Solution

Explanation

We would have to see table definition, cardinalities and the EXPLAIN output to be certain, but the reason is most likely this:

Only your spatial GiST index on a.geom can be used. The btree index is not applicable. Postgres walks through the "closest" rows until it finds the first two matching your predicate.

Normally, more restrictive conditions make queries faster, since fewer rows have to be fetched (and sorted). But not in this constellation:
Fewer qualifying rows don't help a nearest neighbour search with a small LIMIT - au contrair. There are two options:

-
The "nearest neighbour" search based on the GiST index - which is fast as long as some of the closest rows qualify - fastest if you remove the WHERE clause altogether, try it! It gets increasingly expensive if many rows have to be filtered. Basically: the longer your pattern the more expensive the query.

-
A sequential scan, which has to solve the same problem for selective conditions: filter many rows until finding some candidates. The method is faster, but the whole table has to be read, and if more than a few rows are found, the final sort by distance gets expensive.

Basically, Postgres is stuck between a rock and a hard place with this wicked query.
Fix

You would need an index on unaccent(lower(name)) which is not possible, because unaccent() is only STABLE, not IMMUTABLE. But you can work around this limitation with an IMMUTABLE function wrapper - done right:

  • Does PostgreSQL support “accent insensitive” collations?



The best index still seems tricky and depends on your complete situation.

Since you match left-anchored patterns, I would try a separate btree index with the text_pattern_ops operator class, using the function f_unaccent() outlined in the linked answer above:

CREATE INDEX a_name_pattern_idx ON a (f_unaccent(lower(name)) text_pattern_ops);


More:

  • Pattern matching with LIKE, SIMILAR TO or regular expressions



Then:

  • Rewrite the WHERE condition to match the indexed expression



  • Make the wildcard to the right of the expression explicit to make it work for prepared statements.



  • Don't concatenation text parameters into your query to begin with, that's an invitation for sneaky errors and SQL injection. Use a prepared statement:



$sql = 'SELECT a.name, ST_AsText(a.geom), b.name AS name_b, b.id
FROM a
LEFT JOIN b ON b.id = a.b_id
WHERE f_unaccent(lower(a.name)) LIKE (f_unaccent(lower($1)) || $$%$$)
ORDER BY a.geom ST_GeomFromText($$POINT($2 $3)$$, 4326)
LIMIT 2';

$result = pg_query_params($sql, array($searched, $lat, $lon));


I use dollar-quoting to avoid escaping single quotes.

For repeated calls use pg_prepare and pg_execute. Details:

  • Problems with my attempt to implement an UPSERT



Or you could try a GiST index - a multicolumn, functional GiST index combining a PostGis spatial GiST index and a functional trigram GiST index. I didn't test that combination, yet.

You need to install pg_trgm first. Instructions:

  • How is LIKE implemented?



CREATE INDEX a_special_gist_idx ON a
USING gist (f_unaccent(lower(name)) gist_trgm_ops, geom);


I put f_unaccent(lower(name)) first because, quoting the manual:

A multicolumn GiST index can be used with query conditions that
involve any subset of the index's columns. Conditions on additional
columns restrict the entries returned by the index, but the condition
on the first column is the most important one for determining how much
of the index needs to be scanned. A GiST index will be relatively
ineffective if its first column has only a few distinct values, even
if there are many distinct values in additional columns.

Code Snippets

CREATE INDEX a_name_pattern_idx ON a (f_unaccent(lower(name)) text_pattern_ops);
CREATE INDEX a_special_gist_idx ON a
USING gist (f_unaccent(lower(name)) gist_trgm_ops, geom);

Context

StackExchange Database Administrators Q#125789, answer score: 3

Revisions (0)

No revisions yet.