HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Inconsistent query execution times using trigram indexes?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
inconsistentqueryindexesusingtimestrigramexecution

Problem

I have a table residences...

I installed trgm: create extension pg_trgm;

After adding trigram indexes to both grp and name columns using:

CREATE INDEX residences_name_trgm ON residences USING GIN (name gin_trgm_ops);
CREATE INDEX residences_grp_trgm ON residences USING GIN (grp gin_trgm_ops);


And checking performance of a simple query... it appears that the total runtime for a query on the name column runs ~100x faster. Why is this?

EXPLAIN ANALYZE SELECT * FROM residences WHERE name ILIKE '%Sutton%';
                                                            QUERY PLAN                                                            
----------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on residences  (cost=36.02..47.90 rows=3 width=1872) (actual time=0.390..0.720 rows=21 loops=1)
   Recheck Cond: ((name)::text ~~ '%Sutton%'::text)
   ->  Bitmap Index Scan on residences_name_trgm_gin  (cost=0.00..36.02 rows=3 width=0) (actual time=0.354..0.354 rows=21 loops=1)
         Index Cond: ((name)::text ~~ '%Sutton%'::text)
 Total runtime: 0.814 ms
(5 rows)


and

EXPLAIN ANALYZE SELECT * FROM residences WHERE grp ILIKE '%Sutton%';
                                                                  QUERY PLAN                                                                   
-----------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on residences  (cost=97.93..8879.41 rows=5927 width=1872) (actual time=5.516..115.634 rows=5968 loops=1)
   Recheck Cond: ((grp)::text ~~ '%Sutton%'::text)
   ->  Bitmap Index Scan on residences_grp_trgm_gin  (cost=0.00..96.45 rows=5927 width=0) (actual time=4.366..4.366 rows=5968 loops=1)
         Index Cond: ((grp)::text ~~ '%Sutton%'::text)
 Total runtime: 119.779 ms
(5 rows)


For reference, both name and grp col

Solution

That is, the name column actually has ~20x more distinct values, but returns values much faster.

Exactly so. An index on name is more selective than an index on grp, so index use is more effective.

This is confirmed by the fact that your first query, on the name field, the bitmap index scan onresidences_grp_trgm_gin gets only 21 results as shown by actual( ... rows=21 ...) in the explain analyze output. On grp you get 5968 results.

In a bitmap index scan every heap read and re-check has a significant cost, so it makes perfect sense that finding 21 rows is going to be a lot faster than finding 5968.

BTW, please always show select version() and your server configuration, as well as anything else that's relevant from the wiki's guide to reporting problems.

Context

StackExchange Database Administrators Q#46328, answer score: 5

Revisions (0)

No revisions yet.