snippetsqlMinor
How can I implement full-text search on a single value in a text[] column?
Viewed 0 times
canfullimplementsearchcolumntextvaluesinglehow
Problem
I am using Postgresql 9.3 on an Ubuntu 12.4 machine.
I've only recently started studying database and database design, and I am to create a database that includes text that will later be searched through. I've read up on tsvectors and stuff better than
Thanks for the attention, and this is my first post ever here, so if I messed up on this question do tell me. P.S.: I've searched this topic here beforehand, I hope I am not creating a redundant question. P.P.S.: Not a native speaker, sorry for the weird language.
EDIT: Thanks for the answers! The texts in the array won't be of fixed size. They will range from 6 ~ 20 lines, but they need to be editable, and some of the texts inside an array will be formatted in a different manner when displayed to the user. I'm not using multiple
I've only recently started studying database and database design, and I am to create a database that includes text that will later be searched through. I've read up on tsvectors and stuff better than
WHERE column LIKE 'query', but I don't have much experience with it. I am thinking of building a table where one column is of type text[], and another one will be the tsvector build from the text. I will use and array of text because some of the text will be displayed to the user with different formatting etc. I have never worked with array columns before. My questions are:- Is using a text array a good idea at all in this case?
- Will the text array work well with the tsvectors?
- Will using
text[]compromise my performance?
- Are there any pitfalls I should stay alert for?
Thanks for the attention, and this is my first post ever here, so if I messed up on this question do tell me. P.S.: I've searched this topic here beforehand, I hope I am not creating a redundant question. P.P.S.: Not a native speaker, sorry for the weird language.
EDIT: Thanks for the answers! The texts in the array won't be of fixed size. They will range from 6 ~ 20 lines, but they need to be editable, and some of the texts inside an array will be formatted in a different manner when displayed to the user. I'm not using multiple
text columns because some of the entries in the table will have a single, two, or more elements inside the array. I know how each element will be formatted because every row will also have a "type" column, which will define it's formatting.Solution
There are a lot of problems with using
How will you know which member of the array gets what kind of formatting? If that logic is hard-coded into the position of the member in the array, then why not just have multiple columns and key the formatting to the column names?
You can keep this more direct and simpler by following a golden rule,
Moreover, the
As a side note, in a normalized display you can do weighing/ranking with FTS. In order to that with your schema, you'd have to move from the schemaless abuse of
Abomination
If you decide to go down this route, this should be substantially faster and cleaner. But, this is a horrible idea.
Compare the two,
See also
text[] like this, but fundamentally I agree @jjanes when he says,How will you know which member of the array gets what kind of formatting? If that logic is hard-coded into the position of the member in the array, then why not just have multiple columns and key the formatting to the column names?
You can keep this more direct and simpler by following a golden rule,
- If the elements in the array can not be shuffled, then the order matters and the elements are ordinal
- If the order matters, you should not be using an
ARRAYtype, regardless of Full Text Search.
Moreover, the
to_tsvector call is a pretty bad idea, this design is problematic too. ARRAY is not for schema-less design, and the position in the array should not itself be information need to render the data in the app.As a side note, in a normalized display you can do weighing/ranking with FTS. In order to that with your schema, you'd have to move from the schemaless abuse of
text[] to an actual schema'd design anyway.Abomination
If you decide to go down this route, this should be substantially faster and cleaner. But, this is a horrible idea.
CREATE AGGREGATE tsvector_agg (tsvector) (
SFUNC = tsvector_concat,
STYPE = tsvector
);
CREATE OR REPLACE FUNCTION text_array_to_tsvector( mytext text[], out tsv tsvector )
AS $func$
BEGIN
SELECT INTO tsv
tsvector_agg(to_tsvector(t))
FROM unnest(mytext) AS t;
RETURN;
END;
$func$ LANGUAGE plpgsql
IMMUTABLE;Compare the two,
EXPLAIN ANALYZE SELECT to_tsvector(array_to_string(ARRAY['foo', 'bar', 'baz'],' '::text));
EXPLAIN ANALYZE SELECT text_array_to_tsvector(ARRAY['foo', 'bar', 'baz']);See also
- How can I implement full-text search on a value in a JSONB column?
Code Snippets
CREATE AGGREGATE tsvector_agg (tsvector) (
SFUNC = tsvector_concat,
STYPE = tsvector
);
CREATE OR REPLACE FUNCTION text_array_to_tsvector( mytext text[], out tsv tsvector )
AS $func$
BEGIN
SELECT INTO tsv
tsvector_agg(to_tsvector(t))
FROM unnest(mytext) AS t;
RETURN;
END;
$func$ LANGUAGE plpgsql
IMMUTABLE;EXPLAIN ANALYZE SELECT to_tsvector(array_to_string(ARRAY['foo', 'bar', 'baz'],' '::text));
EXPLAIN ANALYZE SELECT text_array_to_tsvector(ARRAY['foo', 'bar', 'baz']);Context
StackExchange Database Administrators Q#162932, answer score: 2
Revisions (0)
No revisions yet.