patternsqlMinor
Equivalent of UTF8_UNICODE_CI collation in PostgreSQL
Viewed 0 times
postgresqlutf8_unicode_ciequivalentcollation
Problem
I would like a column in a table inside a PostgreSQL database (I am using version 9.6). I know of the
but I get:
Reading around, I found that the
So it's clearly not there... Is there any way to add it, or obtain the behavior I want? (I want a search
UTF8_UNICODE_CI collation on MySQL, so I tried:CREATE TABLE thing (
id BIGINT PRIMARY KEY
,name VARCHAR(120) NOT NULL COLLATE "UTF8_UNICODE_CI"
);but I get:
ERROR: collation "UTF8_UNICODE_CI" for encoding "UTF8" does not existReading around, I found that the
pg_collation table displays collations, which showed:=# SELECT * from pg_collation;
collname | collnamespace | collowner | collencoding | collcollate | collctype
----------+---------------+-----------+--------------+-------------+-----------
default | 11 | 10 | -1 | |
C | 11 | 10 | -1 | C | C
POSIX | 11 | 10 | -1 | POSIX | POSIX
(3 rows)So it's clearly not there... Is there any way to add it, or obtain the behavior I want? (I want a search
WHERE name LIKE '%lala%' to match 'lalá', 'LÂLÄ', etc.)Solution
In Postgres, you would typically solve this differently.
For starters, just use
The key word
case-insensitive according to the active locale. This is not in the
SQL standard but is a PostgreSQL extension.
You do not need a different collation for this. Trigram indexes support both case sensitive and insensitive pattern matching:
Or use the more portable
Or you could use the special data type
If you insist on a special collation, you can create your own:
I would not do that.
If you want to take string "normalization" one step further (remove all diacritic signs), look to
You can combine both, instructions in the linked answer. More here:
Overview:
I would just use
For starters, just use
ILIKE for case insensitive matching. The manual:The key word
ILIKE can be used instead of LIKE to make the matchcase-insensitive according to the active locale. This is not in the
SQL standard but is a PostgreSQL extension.
WHERE name ILIKE '%lala%'You do not need a different collation for this. Trigram indexes support both case sensitive and insensitive pattern matching:
- LOWER LIKE vs iLIKE
- How is LIKE implemented?
Or use the more portable
lower(col) LIKEWHERE lower(name) LIKE '%lala%' -- pattern must be lower case, too.- Generic Ruby solution for SQLite3 “LIKE” or PostgreSQL “ILIKE”?
Or you could use the special data type
citext provided by the additional module citext:- Index on column with data type citext not used
If you insist on a special collation, you can create your own:
- Add a new collation to a Postgres database
I would not do that.
If you want to take string "normalization" one step further (remove all diacritic signs), look to
unaccent():- Does PostgreSQL support “accent insensitive” collations?
You can combine both, instructions in the linked answer. More here:
- PostgreSQL accent + case insensitive search
Overview:
- Pattern matching with LIKE, SIMILAR TO or regular expressions
I would just use
ILIKE and support it with a trigram index for big tables. Preferably use the latest version Postgres 9.6. It has several improvements for trigram GIN indexes.Code Snippets
WHERE name ILIKE '%lala%'WHERE lower(name) LIKE '%lala%' -- pattern must be lower case, too.Context
StackExchange Database Administrators Q#156907, answer score: 5
Revisions (0)
No revisions yet.