patternsqlModerate
PostgreSQL full text search on many columns
Viewed 0 times
postgresqlfullsearchcolumnstextmany
Problem
I need an advice with searching of record based on specified string.
Search strings can contain values from these columns. Values in this string don´t have to be strictly identical given in the correct order and also the values of some columns in this string may be missing.
Example of search string:
And I get for example result with top 5 similar records.
I think I should use full text search, but I have no experiences with it. Can you tell me how to proceed?
Search strings can contain values from these columns. Values in this string don´t have to be strictly identical given in the correct order and also the values of some columns in this string may be missing.
Example of search string:
22 Karntner WienAnd I get for example result with top 5 similar records.
I think I should use full text search, but I have no experiences with it. Can you tell me how to proceed?
Solution
I suggest this expression for query and index:
Note the custom function
Or as "standard SQL" function in Postgres 14+. See:
It's a drop-in replacement for
About
Or, if you you have superuser privileges, see:
For many columns, this is shorter and faster. You could do without it but then the syntax gets rather verbose (see joanolo's answer).
The matching index to go with this:
You are dealing with international address data, so do not use the
Concatenate the strings and call the more expensive function
Like I commented, Full Text Search has limited support for fuzzy matching, but there is the often overlooked feature of prefix matching:
So, if you are not sure whether it's 'Kärntner' or 'Kärnten', and whether it's 'Straße', 'strasse' or 'Strabe' (like in your buggy example data) but you know that the second word follows the first, you could:
`
Related:
-
How do IMMUTABLE, STABLE and VOLATILE keywords effect behavior of function?
-
Combine two columns and add into one new column
SELECT * FROM tbl
WHERE to_tsvector('simple', immutable_concat_ws(' ', country, city, street, house_nr, postcode))
@@ plainto_tsquery('simple', '22 Kärntner Wien');Note the custom function
immutable_concat_ws(). concat_ws() is only STABLE, not IMMUTABLE. Create this function first:CREATE OR REPLACE FUNCTION immutable_concat_ws(text, VARIADIC text[])
RETURNS text
LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
'SELECT array_to_string($2, $1)';Or as "standard SQL" function in Postgres 14+. See:
- What does BEGIN ATOMIC mean in a PostgreSQL SQL function / procedure?
CREATE OR REPLACE FUNCTION immutable_concat_ws(text, VARIADIC text[])
RETURNS text
LANGUAGE sql IMMUTABLE PARALLEL SAFE
RETURN array_to_string($2, $1);It's a drop-in replacement for
concat_ws(), except that it only takes text input, which allows us to make it IMMUTABLE without cheating, effectively. Passing non-immutable input (with explicit cast to text) would void this warranty! Detailed explanation (read it!):- Combine two columns and add into one new column
About
VARIADIC:- Passing multiple values in single parameter
Or, if you you have superuser privileges, see:
- Create an immutable clone of concat_ws
For many columns, this is shorter and faster. You could do without it but then the syntax gets rather verbose (see joanolo's answer).
The matching index to go with this:
CREATE INDEX tbl_adr_fts_idx ON tbl USING GIN (
to_tsvector('simple', immutable_concat_ws(' ', country, city, street, house_nr, postcode)));You are dealing with international address data, so do not use the
english text search configuration. Stemming makes little sense for names and most of your example data is not even English to begin with. Use the simple configuration instead. You need the form with two parameters - see below.Concatenate the strings and call the more expensive function
to_tsvector() once. Use concat_ws() to deal with possible NULL values elegantly. Cheaper overall, and also shorter.- String replace using concatenated strings from various columns
Like I commented, Full Text Search has limited support for fuzzy matching, but there is the often overlooked feature of prefix matching:
- Get partial match from GIN indexed TSVECTOR column
So, if you are not sure whether it's 'Kärntner' or 'Kärnten', and whether it's 'Straße', 'strasse' or 'Strabe' (like in your buggy example data) but you know that the second word follows the first, you could:
... @@ to_tsquery('simple', '22 & Kärnt:* Stra:* & Wien')`
is the phrase search operator and requires Postgres 9.6.
- How to store short stories for access to individual sentences?
And if you want to ignore diacritical signs as well ('ä' <> 'a'), add unaccent() to the mix. You can use it as separate function or you can add it as dictionary to your text search configuration. You need to install the extension first ...
- Does PostgreSQL support “accent insensitive” collations?
Overview over pattern matching option in typical Postgres installations:
- Pattern matching with LIKE, SIMILAR TO or regular expressions
Joanolo already provided some basic information about FTS and the link to the manual for more.
Addressing your comment
I am trying add this index it but gives me an error:
ERROR: functions in index expression must be marked IMMUTABLE
There are two variants of the function to_tsvector() - see "function overloading". The 1st takes only text, the 2nd takes regconfig and text. See for yourself:
SELECT proname, provolatile, proargtypes[0]::regtype, proargtypes[1]::regtype
FROM pg_proc
WHERE proname = 'to_tsvector';
Only the second is IMMUTABLE and can be used in an index expression directly. 'simple' in the above example is a text search configuration (regconfig).
More importantly, my oversight: concat_ws() (which I had in my first version) is only STABLE, not IMMUTABLE`. I added necessary steps above.Related:
-
How do IMMUTABLE, STABLE and VOLATILE keywords effect behavior of function?
-
Combine two columns and add into one new column
Code Snippets
SELECT * FROM tbl
WHERE to_tsvector('simple', immutable_concat_ws(' ', country, city, street, house_nr, postcode))
@@ plainto_tsquery('simple', '22 Kärntner Wien');CREATE OR REPLACE FUNCTION immutable_concat_ws(text, VARIADIC text[])
RETURNS text
LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
'SELECT array_to_string($2, $1)';CREATE OR REPLACE FUNCTION immutable_concat_ws(text, VARIADIC text[])
RETURNS text
LANGUAGE sql IMMUTABLE PARALLEL SAFE
RETURN array_to_string($2, $1);CREATE INDEX tbl_adr_fts_idx ON tbl USING GIN (
to_tsvector('simple', immutable_concat_ws(' ', country, city, street, house_nr, postcode)));... @@ to_tsquery('simple', '22 & Kärnt:* <-> Stra:* & Wien')Context
StackExchange Database Administrators Q#164000, answer score: 18
Revisions (0)
No revisions yet.