HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Not searching GIN index when querying ANY(column)

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
searchingcolumnanyginqueryingwhenindexnot

Problem

In a Postgres 9.4.11 database, I have a large table of books, which has a one-to-many relationship with authors. Theoretically to improve performance, I've cached the author names in the ebook record for quick searching. Properties of the column authors_cache:

Sample rows:

I've created a GIN index on this column:

CREATE  INDEX  "index_ebooks_on_authors_cache" ON "ebooks" USING gin ("authors_cache")


It's not using the index when searching authors_cache and takes a unacceptable amount of time:

EXPLAIN ANALYZE
SELECT  * FROM "ebooks"
WHERE ('Charles Bukowski' = ANY (authors_cache))
LIMIT 60 OFFSET 0;


Result:

Limit (cost=0.00..172334.67 rows=30 width=124) (actual time=71.962..10067.070 rows=59 loops=1)
-> Seq Scan on ebooks (cost=0.00..172334.67 rows=30 width=124) (actual time=71.960..10067.015 rows=59 loops=1)
Filter: ('Charles Bukowski'::text = ANY ((authors_cache)::text[]))
Rows Removed by Filter: 894504
Planning time: 0.188 ms
Execution time: 10067.112 ms


10 seconds is not an acceptable amount of time. I'm open to design changes, as I'm not "married" to this column, yet.

Solution

Use array operators which are supported by your GIN index. The = ANY () construct is not.

SELECT *
FROM   ebooks
WHERE  authors_cache @> '{Charles Bukowski}'  -- array literal
LIMIT  60;


Detailed explanation:

  • Can PostgreSQL index array columns?



Not sure your de-normalization will buy you much. A standard 1:n implementation with simple btree indexes should perform nicely, too.

If you are going with this approach, I would consider storing an array of integer (int[]) with author_id's. Substantially smaller array column and GIN index. Potential problems with referential integrity are about the same. (Array elements cannot currently have FK constraints. Attempts to implement this in Postgres were unsuccessful so far.)

Oh, and consider upgrading. Performance with GIN indexes has been improved substantially since Postgres 9.4.

Code Snippets

SELECT *
FROM   ebooks
WHERE  authors_cache @> '{Charles Bukowski}'  -- array literal
LIMIT  60;

Context

StackExchange Database Administrators Q#187959, answer score: 8

Revisions (0)

No revisions yet.