HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Postgres full text search with multiple columns, why concat in index and not at runtime?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
indexwhyfullsearchpostgreswithcolumnstextruntimemultiple

Problem

I've come across full text search in postgres in the last few days, and I am a little confused about indexing when searching across multiple columns.

The postgres docs talk about creating a ts_vector index on concatenated columns, like so:

CREATE INDEX pgweb_idx ON pgweb 
    USING gin(to_tsvector('english', title || ' ' || body));


which I can search like so:

... WHERE 
      (to_tsvector('english', title||' '||body) @@ to_tsquery('english', 'foo'))


However, if I wanted to sometimes search just the title, sometimes just the body, and sometimes both, I would need 3 separate indexes. And if I added in a third column, that could potentially be 6 indexes, and so on.

An alternative which I haven't seen in the docs is just to index the two columns seperately, and then just use a normal WHERE...OR query:

... WHERE
      (to_tsvector('english', title) @@ to_tsquery('english','foo'))
    OR
      (to_tsvector('english', body) @@ to_tsquery('english','foo'))


Benchmarking the two on ~1million rows seems to have basically no difference in performance.

So my question is:

Why would I want to concatenate indexes like this, rather than just indexing columns individually? What are the advantages/disadvantages of both?

My best guess is that if I knew in advance I would only want to ever search both columns (never one at a time) I would only ever need one index by concatenating which use less memory.

Solution

No you don't need separate indexes. Use the weights feature. They are just a label your can query against. You can have up to four labels to query against (A-D).

--search any "field" for quick:
select 'quick:1A brown:2B quick:3C'::tsvector @@ 'quick'::tsquery; --true

--search B "field" for quick:
select 'quick:1A brown:2B quick:3C'::tsvector @@ 'quick:B'::tsquery; --false

--search B or C "fields" for quick:
select 'quick:1A brown:2B quick:3C'::tsvector @@ 'quick:BC'::tsquery; --true


You might want to concatenate tsvectors, so that you can separately apply weights to them and then put them together:

select
  setweight( name_column::tsvector, 'A') || setweight( phone_column::tsvector, 'B');

Code Snippets

--search any "field" for quick:
select 'quick:1A brown:2B quick:3C'::tsvector @@ 'quick'::tsquery; --true

--search B "field" for quick:
select 'quick:1A brown:2B quick:3C'::tsvector @@ 'quick:B'::tsquery; --false

--search B or C "fields" for quick:
select 'quick:1A brown:2B quick:3C'::tsvector @@ 'quick:BC'::tsquery; --true
select
  setweight( name_column::tsvector, 'A') || setweight( phone_column::tsvector, 'B');

Context

StackExchange Database Administrators Q#15412, answer score: 6

Revisions (0)

No revisions yet.