snippetsqlMinor
Using SELECT within to_tsvector call in CREATE INDEX
Viewed 0 times
indexcreatewithincallusingselectto_tsvector
Problem
I'm attempting to create a GIN index for full-text search on a field that may have text in it that is one of several different languages. The languages are pre-set and we'll know which language we want to search in at query time. Here is the schema I've got thus far...
It reports a syntax error at or near "SELECT", which makes sense...I didn't think I could just shoehorn a select statement in there, but I'm not sure how to correctly use a select statement within the to_tsvector call.
I looked at https://stackoverflow.com/a/21299033/4471711, however that hardcodes the languages, whereas I want the languages to be pulled from a table dynamically at index creation time. Obviously, if/when we add a language, we would need to recreate the index.
I also looked at https://stackoverflow.com/a/15135730/4471711, however that is not in support of creating an index and only requires 1 parameter to to_tsvector, whereas I seem to need to pass in both the language and the text column.
I am also trying to create a full-text search index on a more document-style field as below:
```
CREATE TABLE source.post (
id BIGSERIAL PRIMARY KEY,
text TEXT N
CREATE SCHEMA source;
CREATE SCHEMA common;
CREATE TABLE common.lang (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL UNIQUE,
code TEXT NOT NULL UNIQUE,
created_at TIMESTAMP NOT NULL DEFAULT now()
);
INSERT INTO common.lang (name, code) VALUES
('english', 'en'),
('arabic', 'ar'),
('chinese', 'zh'),
('undefined', 'und');
CREATE TABLE source.user (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
screen_name TEXT NOT NULL UNIQUE,
profile_lang_id INTEGER NOT NULL REFERENCES common.lang(id) ON DELETE CASCADE ON UPDATE CASCADE,
created_at TIMESTAMP NOT NULL DEFAULT now()
);
CREATE INDEX source_user_names_idx ON source.user USING GIN (to_tsvector(SELECT common.lang.name FROM source.user INNER JOIN common.lang ON common.lang.id = source.user.profile_lang_id, name || ' ' || screen_name));It reports a syntax error at or near "SELECT", which makes sense...I didn't think I could just shoehorn a select statement in there, but I'm not sure how to correctly use a select statement within the to_tsvector call.
I looked at https://stackoverflow.com/a/21299033/4471711, however that hardcodes the languages, whereas I want the languages to be pulled from a table dynamically at index creation time. Obviously, if/when we add a language, we would need to recreate the index.
I also looked at https://stackoverflow.com/a/15135730/4471711, however that is not in support of creating an index and only requires 1 parameter to to_tsvector, whereas I seem to need to pass in both the language and the text column.
I am also trying to create a full-text search index on a more document-style field as below:
```
CREATE TABLE source.post (
id BIGSERIAL PRIMARY KEY,
text TEXT N
Solution
- You don't. You can't create an index that references another table.
- To set the default see
default-text-search-config
Workaround
You can however use functions in an index. And functions can references external tables. That said, using this is kind of hack because changing the table (
common.lang) will require reindexing and clearing the session cache.CREATE FUNCTION common.mylookup(id int)
RETURNS regconfig AS $
SELECT name::regconfig
FROM common.lang
WHERE id = id
$
LANGUAGE sql
IMMUTABLE;
CREATE INDEX
ON source.user
USING GIN (to_tsvector(common.mylookup(profile_lang_id), name || ' ' || screen_name ));You can mark functions as
IMMUTABLE which makes this permissible. If the underlying table mutates you'll have to REINDEX the index. In projects that use this hack, like PostGIS, they recreate indexes on point releases.Followup
@EvanCarroll, I'm trying to create the tsvector on name || ' ' || screen_name. – Brooks 4 mins ago
Full text search isn't there to do what you think it does. It's not there to search multiple fields. It's there to vectorize word content and make use of dictionaries, stubbing, lexers, gazetteers, stop-word elimination, and a slew of other tricks none of which apply. If this doesn't make sense to you, you'll have to read the docs. If what you want is grep then FTS is only seldom what you want. If you want to grep over small chunks of non-standard text (like names) it's not what you want. What you likely want trigram indexing.
If all you want is a
%term% on two fields, you're better off just doing that with a trigram index.CREATE EXTENSION pg_trgm;
CREATE INDEX ON source.user USING GIN ((name || ' ' || screen_name) gin_trgm_ops);
WHERE name || ' ' || screen_name like '%$1%';Or even better,
CREATE INDEX ON source.user USING GIN (name gin_trgm_ops, screen_name gin_trgm_ops);
WHERE name LIKE '%$1%' OR screen_name LIKE '%$1%';Code Snippets
CREATE FUNCTION common.mylookup(id int)
RETURNS regconfig AS $$
SELECT name::regconfig
FROM common.lang
WHERE id = id
$$
LANGUAGE sql
IMMUTABLE;
CREATE INDEX
ON source.user
USING GIN (to_tsvector(common.mylookup(profile_lang_id), name || ' ' || screen_name ));CREATE EXTENSION pg_trgm;
CREATE INDEX ON source.user USING GIN ((name || ' ' || screen_name) gin_trgm_ops);
WHERE name || ' ' || screen_name like '%$1%';CREATE INDEX ON source.user USING GIN (name gin_trgm_ops, screen_name gin_trgm_ops);
WHERE name LIKE '%$1%' OR screen_name LIKE '%$1%';Context
StackExchange Database Administrators Q#182510, answer score: 3
Revisions (0)
No revisions yet.