patternsqlMinor
Why tsquery containing a hyphen isn't found in full text search?
Viewed 0 times
containingwhyfullsearchtextfoundtsqueryhyphenisn
Problem
Trying to figure out, why the below query isn't returning any results:
The term exists in the headline of a story in the database (PostgreSQL 11.12) like:
The only way I could get the query to work was to replace the hyphen with a dash in both the vector and the query which worked, so I tried debugging the phrase, but it seems like the hyphen is but a space symbol - I thought it has to do something with the numerals in the phrase, or is it?
select id from story where tsv_story_text @@ to_tsquery('CumEx-Files2.0');The term exists in the headline of a story in the database (PostgreSQL 11.12) like:
CumEx-Files 2.0 investigation reveals €150bn tax fraud
The only way I could get the query to work was to replace the hyphen with a dash in both the vector and the query which worked, so I tried debugging the phrase, but it seems like the hyphen is but a space symbol - I thought it has to do something with the numerals in the phrase, or is it?
SELECT * FROM ts_debug('english', 'CumEx-Files2.0');
alias | description | token | dictionaries | dictionary | lexemes
-----------------+---------------------------------+-------------+----------------+--------------+-------------
asciihword | Hyphenated word, all ASCII | CumEx-Files | {english_stem} | english_stem | {cumex-fil}
hword_asciipart | Hyphenated word part, all ASCII | CumEx | {english_stem} | english_stem | {cumex}
blank | Space symbols | - | {} | |
hword_asciipart | Hyphenated word part, all ASCII | Files | {english_stem} | english_stem | {file}
blank | Space symbols | | {} | |
float | Decimal notation | 2.0 | {simple} | simple | {2.0}Solution
There was change of behavior in Postgres 14. (A fix, really, but not back-patched, because change of behavior ...)
The release notes for Postgres 14:
-
Fix
Certain discarded tokens, like underscore, caused the output of these functions to produce incorrect tsquery output, e.g., both
That's exactly what happens in your case. Compare the results in the following two fiddles:
db<>fiddle here - Postgres 11
db<>fiddle here - Postgres 14
So your query:
... now works as expected in Postgres 14 or later (returns
The release notes for Postgres 14:
-
Fix
to_tsquery() and websearch_to_tsquery() to properly parse query text containing discarded tokens (Alexander Korotkov)Certain discarded tokens, like underscore, caused the output of these functions to produce incorrect tsquery output, e.g., both
websearch_to_tsquery('"pg_class pg"') and to_tsquery('pg_class pg') used to output ( 'pg' & 'class' ) 'pg', but now both output 'pg' 'class' 'pg'.That's exactly what happens in your case. Compare the results in the following two fiddles:
db<>fiddle here - Postgres 11
db<>fiddle here - Postgres 14
So your query:
SELECT to_tsvector('english', 'CumEx-Files 2.0 investigation reveals €150bn tax fraud') @@ to_tsquery('english', 'CumEx-Files2.0');... now works as expected in Postgres 14 or later (returns
true). Consider upgrading.Code Snippets
SELECT to_tsvector('english', 'CumEx-Files 2.0 investigation reveals €150bn tax fraud') @@ to_tsquery('english', 'CumEx-Files<->2.0');Context
StackExchange Database Administrators Q#302750, answer score: 4
Revisions (0)
No revisions yet.