patternsqlMajor
Get partial match from GIN indexed TSVECTOR column
Viewed 0 times
columnmatchginindexedgettsvectorpartialfrom
Problem
I would like to get results by query this:
This works and return rows with
Do all queries have to be in dictionary? Can't we just query such letters? I have a table that contains e-mail bodies (content) and I would like to make queries fast as it grows rapidly. Currently, I am using:
SELECT *
FROM (
SELECT id, subject
FROM mailboxes
WHERE tsv @@ plainto_tsquery('avail')
) AS t1
ORDER BY id DESC;This works and return rows with
tsv containing 'Available'. But if I use 'avai' (truncating '...lable') it does not find anything.Do all queries have to be in dictionary? Can't we just query such letters? I have a table that contains e-mail bodies (content) and I would like to make queries fast as it grows rapidly. Currently, I am using:
... WHERE content ~* 'avail'Solution
Do all queries have to be in dictionary?
No. Because only word stems (according to the used text search configuration) are in the index to begin with. But more importantly:
No. Because, on top of that Full Text Search is also capable of prefix matching:
This would work:
Note 3 things:
-
Use
...
-
Use the
-
Append
Important: This is a prefix search on lexemes (word stems) in the document. A regular expression match without wildcards (
It's unclear whether you want the behavior outlined in your query or the equivalent of the added regular expression. Trigram indexes using the additional module (
Overview:
Demo
Related answer (chapter "Different approach to optimize search"):
Emails?
Since you mentioned emails, be aware that the text search parser identifies emails and does not split them into separate words / lexemes. Consider:
I would replace the separators
Also, since you are dealing with names in emails, not with English (or some other language) words, I would use the
Build the
No. Because only word stems (according to the used text search configuration) are in the index to begin with. But more importantly:
No. Because, on top of that Full Text Search is also capable of prefix matching:
This would work:
SELECT id, subject
FROM mailboxes
WHERE tsv @@ to_tsquery('simple', 'avail:*')
ORDER BY id DESC;Note 3 things:
-
Use
to_tsquery(), not plainto_tsquery(), in this case because (quoting the manual):...
plainto_tsquery will not recognize tsquery operators, weight labels, or prefix-match labels in its input-
Use the
'simple' text search configuration to generate the tsquery since you obviously want to take the word 'avail' as is and not apply stemming.-
Append
:* to make it a prefix search, i.e find all lexemes starting with 'avail'.Important: This is a prefix search on lexemes (word stems) in the document. A regular expression match without wildcards (
content ~* 'avail') is not exactly the same! The latter is not left-anchored (to the start of lexemes) and would also find 'FOOavail' etc.It's unclear whether you want the behavior outlined in your query or the equivalent of the added regular expression. Trigram indexes using the additional module (
pg_trgm) are the right tool for that. There are many related questions on dba.SE, try a search.Overview:
- Pattern matching with LIKE, SIMILAR TO or regular expressions
Demo
SELECT *
FROM (
VALUES
('Zend has no framework')
, ('Zend Framework')
) sub(t), to_tsvector(t) AS tsv
WHERE tsv @@ to_tsquery('zend fram:*'); id | t | tsv
----+----------------+------------------------
2 | Zend Framework | 'framework':2 'zend':1
Related answer (chapter "Different approach to optimize search"):
- How can I generate all trailing substrings following a delimeter?
Emails?
Since you mentioned emails, be aware that the text search parser identifies emails and does not split them into separate words / lexemes. Consider:
SELECT ts_debug('english', 'xangr@some.domain.com')(email,"Email address",xangr@some.domain.com,{simple},simple,{xangr@some.domain.com})I would replace the separators
@ and . in your emails with space (' ') to index contained words.Also, since you are dealing with names in emails, not with English (or some other language) words, I would use the
'simple' text search configuration to disable stemming and other language features:Build the
ts_vector column with:SELECT to_tsvector('simple', translate('joe.xangr@some.domain.com', '@.', ' ')) AS tsv;Code Snippets
SELECT id, subject
FROM mailboxes
WHERE tsv @@ to_tsquery('simple', 'avail:*')
ORDER BY id DESC;SELECT *
FROM (
VALUES
('Zend has no framework')
, ('Zend Framework')
) sub(t), to_tsvector(t) AS tsv
WHERE tsv @@ to_tsquery('zend <-> fram:*');SELECT ts_debug('english', 'xangr@some.domain.com')(email,"Email address",xangr@some.domain.com,{simple},simple,{xangr@some.domain.com})SELECT to_tsvector('simple', translate('joe.xangr@some.domain.com', '@.', ' ')) AS tsv;Context
StackExchange Database Administrators Q#157951, answer score: 34
Revisions (0)
No revisions yet.