patternsqlMinor
Postgres full text search with multiple columns, why concat in index and not at runtime?
Viewed 0 times
indexwhyfullsearchpostgreswithcolumnstextruntimemultiple
Problem
I've come across full text search in postgres in the last few days, and I am a little confused about indexing when searching across multiple columns.
The postgres docs talk about creating a
which I can search like so:
However, if I wanted to sometimes search just the title, sometimes just the body, and sometimes both, I would need 3 separate indexes. And if I added in a third column, that could potentially be 6 indexes, and so on.
An alternative which I haven't seen in the docs is just to index the two columns seperately, and then just use a normal
Benchmarking the two on ~1million rows seems to have basically no difference in performance.
So my question is:
Why would I want to concatenate indexes like this, rather than just indexing columns individually? What are the advantages/disadvantages of both?
My best guess is that if I knew in advance I would only want to ever search both columns (never one at a time) I would only ever need one index by concatenating which use less memory.
The postgres docs talk about creating a
ts_vector index on concatenated columns, like so:CREATE INDEX pgweb_idx ON pgweb
USING gin(to_tsvector('english', title || ' ' || body));which I can search like so:
... WHERE
(to_tsvector('english', title||' '||body) @@ to_tsquery('english', 'foo'))However, if I wanted to sometimes search just the title, sometimes just the body, and sometimes both, I would need 3 separate indexes. And if I added in a third column, that could potentially be 6 indexes, and so on.
An alternative which I haven't seen in the docs is just to index the two columns seperately, and then just use a normal
WHERE...OR query:... WHERE
(to_tsvector('english', title) @@ to_tsquery('english','foo'))
OR
(to_tsvector('english', body) @@ to_tsquery('english','foo'))Benchmarking the two on ~1million rows seems to have basically no difference in performance.
So my question is:
Why would I want to concatenate indexes like this, rather than just indexing columns individually? What are the advantages/disadvantages of both?
My best guess is that if I knew in advance I would only want to ever search both columns (never one at a time) I would only ever need one index by concatenating which use less memory.
Solution
No you don't need separate indexes. Use the weights feature. They are just a label your can query against. You can have up to four labels to query against (A-D).
You might want to concatenate tsvectors, so that you can separately apply weights to them and then put them together:
--search any "field" for quick:
select 'quick:1A brown:2B quick:3C'::tsvector @@ 'quick'::tsquery; --true
--search B "field" for quick:
select 'quick:1A brown:2B quick:3C'::tsvector @@ 'quick:B'::tsquery; --false
--search B or C "fields" for quick:
select 'quick:1A brown:2B quick:3C'::tsvector @@ 'quick:BC'::tsquery; --trueYou might want to concatenate tsvectors, so that you can separately apply weights to them and then put them together:
select
setweight( name_column::tsvector, 'A') || setweight( phone_column::tsvector, 'B');Code Snippets
--search any "field" for quick:
select 'quick:1A brown:2B quick:3C'::tsvector @@ 'quick'::tsquery; --true
--search B "field" for quick:
select 'quick:1A brown:2B quick:3C'::tsvector @@ 'quick:B'::tsquery; --false
--search B or C "fields" for quick:
select 'quick:1A brown:2B quick:3C'::tsvector @@ 'quick:BC'::tsquery; --trueselect
setweight( name_column::tsvector, 'A') || setweight( phone_column::tsvector, 'B');Context
StackExchange Database Administrators Q#15412, answer score: 6
Revisions (0)
No revisions yet.