HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

PostgreSQL GIN index not used when ts_query language is fetched from a column

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlfromcolumnusedfetchedginlanguagewhenindexts_query

Problem

I've got a table that stores some multilingual content:

CREATE TABLE search (
  content text NOT NULL,
  language regconfig NOT NULL,
  fulltext tsvector
);

CREATE INDEX search_fulltext ON search USING GIN(fulltext);

INSERT INTO search (language, content) VALUES 
  ('dutch', 'Als achter vliegen vliegen vliegen vliegen vliegen vliegen achterna'),
  ('dutch', 'Langs de koele kali liep een kale koeli met een kilo kali op zijn kale koeli-kop.'),
  ('dutch', 'Moeder sneed zeven scheve sneden brood'),
  ('english', 'I saw Susie sitting in a shoe shine shop. Where she sits she shines, and where she shines she sits.'),
  ('english', 'How can a clam cram in a clean cream can?'),
  ('english', 'Can you can a can as a canner can can a can?');

UPDATE search SET fulltext = to_tsvector(language, content);


To make sure I always search in the correct language I use these queries:

SELECT FROM search WHERE fulltext @@ to_tsquery(language, 'shine');
(1 row)

SELECT FROM search WHERE fulltext @@ to_tsquery(language, 'vlieg');
(1 row)


Because hardcoding the language doesn't give the correct results:

SELECT FROM search WHERE fulltext @@ to_tsquery('dutch', 'shine');
(0 rows)

SELECT FROM search WHERE fulltext @@ to_tsquery('english', 'vlieg');
(0 rows)


The problem however is that PostgreSQL doesn't use the GIN index when using the first set of queries and instead does a sequential scan:

(Note: I've disabled scanning using SET enable_seqscan = OFF; for these examples because of the low amount of rows)

EXPLAIN ANALYZE SELECT * FROM search WHERE fulltext @@ to_tsquery(language, 'shine');
---
Seq Scan on search  (cost=0.00..17.35 rows=2 width=136) (actual time=0.040..0.044 rows=1 loops=1)
    Filter: (fulltext @@ to_tsquery(language, 'shine'::text))
    Rows Removed by Filter: 5
Planning time: 0.039 ms
Execution time: 0.064 ms
(5 rows)


While it does when hardcoding a language:

```
EXPLAIN ANALYZE SELECT FROM search WHERE fulltext @@ to_tsquery('

Solution

I suggest a solution with partial expression indexes:

CREATE TABLE search (
   search_id serial PRIMARY KEY
 , language  regconfig NOT NULL  -- order of columns matters a bit
 , content   text NOT NULL
   --  *no* redundant fulltext tsvector
);


No redundant fulltext column - makes the table smaller, which helps overall performance.

Create one partial expression index for every relevant language:

CREATE INDEX search_fulltext_dutch ON search USING GIN(to_tsvector('dutch', content))
WHERE language = 'dutch'::regconfig;
CREATE INDEX search_fulltext_english ON search USING GIN(to_tsvector('english', content))
WHERE language = 'english'::regconfig;
-- more?


All of the partial indexes together are only about as big as your total index.

Then match index conditions in the query:

SELECT * FROM search  -- does not return useless column fulltext now
WHERE  language = 'dutch'::regconfig  -- match partial index condition
AND    to_tsvector('dutch', content) @@ to_tsquery('dutch', 'vliegen')

UNION ALL
SELECT * FROM search
WHERE  language = 'english'::regconfig
AND    to_tsvector('english', content) @@ to_tsquery('english', 'vliegen');

-- more?


You get bitmap index or index scans this way.

Another index on language may be useful for other purposes, this query does not need it.

Code Snippets

CREATE TABLE search (
   search_id serial PRIMARY KEY
 , language  regconfig NOT NULL  -- order of columns matters a bit
 , content   text NOT NULL
   --  *no* redundant fulltext tsvector
);
CREATE INDEX search_fulltext_dutch ON search USING GIN(to_tsvector('dutch', content))
WHERE language = 'dutch'::regconfig;
CREATE INDEX search_fulltext_english ON search USING GIN(to_tsvector('english', content))
WHERE language = 'english'::regconfig;
-- more?
SELECT * FROM search  -- does not return useless column fulltext now
WHERE  language = 'dutch'::regconfig  -- match partial index condition
AND    to_tsvector('dutch', content) @@ to_tsquery('dutch', 'vliegen')

UNION ALL
SELECT * FROM search
WHERE  language = 'english'::regconfig
AND    to_tsvector('english', content) @@ to_tsquery('english', 'vliegen');

-- more?

Context

StackExchange Database Administrators Q#149765, answer score: 3

Revisions (0)

No revisions yet.