patternsqlMinor
Postgres full text search with unaccent and inflection (conjugation, etc.)
Viewed 0 times
fullsearchpostgreswithtextinflectionconjugationandunaccentetc
Problem
I want to be able to search unaccented phrases in an inflected (Polish) language in Postgres.
Say, if a document contains
What I did is I started out with a perfectly well working polish text search config
Then I tried to extend it to include the
Sadly, lexems are not created correctly with this config:
The lexem should be of course:
So the below cant return true (and that's what I need I think):
I've googled but did not find any documents showing how to really configure Postgres for my case. The docs only show the lame 'Hôtels' example, which is not a 'lexemed' word.
Cheers
Say, if a document contains
robiłem, the lexeme should be robić (the infinivite). Its forms are robię, robił, robiła and so on. I want to be able to find it, for example, with a phrase robie which is unaccented robię.What I did is I started out with a perfectly well working polish text search config
CREATE TEXT SEARCH DICTIONARY polish_ispell (
TEMPLATE = pg_catalog.ispell,
dictfile = 'polish', afffile = 'polish', stopwords = 'polish' );Then I tried to extend it to include the
unaccent.create extension unaccent;
create text search configuration polish_unaccented (copy = polish);
ALTER TEXT SEARCH CONFIGURATION polish_unaccented ALTER MAPPING FOR hword,
hword_part, word WITH unaccen, polish_ispell, simple, ;Sadly, lexems are not created correctly with this config:
select to_tsvector('polish_unaccented' ,'robił');
'robil':1The lexem should be of course:
'robić':1So the below cant return true (and that's what I need I think):
select to_tsvector('polish_unaccented','robić') @@ to_tsquery('polish_unaccented','robie');I've googled but did not find any documents showing how to really configure Postgres for my case. The docs only show the lame 'Hôtels' example, which is not a 'lexemed' word.
Cheers
Solution
Wow, this was fun. So I wrote a program to do this for you called
pg_hunspellpg_hunspell pl PL polish
SELECT to_tsvector('polish' ,'robił');
to_tsvector
-------------
'robić':1
(1 row)Code Snippets
pg_hunspell pl PL polish
SELECT to_tsvector('polish' ,'robił');
to_tsvector
-------------
'robić':1
(1 row)Context
StackExchange Database Administrators Q#176471, answer score: 3
Revisions (0)
No revisions yet.