HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Equivalent of UTF8_UNICODE_CI collation in PostgreSQL

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlutf8_unicode_ciequivalentcollation

Problem

I would like a column in a table inside a PostgreSQL database (I am using version 9.6). I know of the UTF8_UNICODE_CI collation on MySQL, so I tried:

CREATE TABLE thing (
    id    BIGINT PRIMARY KEY
   ,name  VARCHAR(120) NOT NULL COLLATE "UTF8_UNICODE_CI"
);


but I get:

ERROR: collation "UTF8_UNICODE_CI" for encoding "UTF8" does not exist


Reading around, I found that the pg_collation table displays collations, which showed:

=# SELECT * from pg_collation;
 collname | collnamespace | collowner | collencoding | collcollate | collctype
----------+---------------+-----------+--------------+-------------+-----------
 default  |            11 |        10 |           -1 |             |
 C        |            11 |        10 |           -1 | C           | C
 POSIX    |            11 |        10 |           -1 | POSIX       | POSIX
(3 rows)


So it's clearly not there... Is there any way to add it, or obtain the behavior I want? (I want a search WHERE name LIKE '%lala%' to match 'lalá', 'LÂLÄ', etc.)

Solution

In Postgres, you would typically solve this differently.

For starters, just use ILIKE for case insensitive matching. The manual:

The key word ILIKE can be used instead of LIKE to make the match
case-insensitive according to the active locale. This is not in the
SQL standard but is a PostgreSQL extension.

WHERE name ILIKE '%lala%'


You do not need a different collation for this. Trigram indexes support both case sensitive and insensitive pattern matching:

  • LOWER LIKE vs iLIKE



  • How is LIKE implemented?



Or use the more portable lower(col) LIKE

WHERE lower(name) LIKE '%lala%'  -- pattern must be lower case, too.


  • Generic Ruby solution for SQLite3 “LIKE” or PostgreSQL “ILIKE”?



Or you could use the special data type citext provided by the additional module citext:

  • Index on column with data type citext not used



If you insist on a special collation, you can create your own:

  • Add a new collation to a Postgres database



I would not do that.

If you want to take string "normalization" one step further (remove all diacritic signs), look to unaccent():

  • Does PostgreSQL support “accent insensitive” collations?



You can combine both, instructions in the linked answer. More here:

  • PostgreSQL accent + case insensitive search



Overview:

  • Pattern matching with LIKE, SIMILAR TO or regular expressions



I would just use ILIKE and support it with a trigram index for big tables. Preferably use the latest version Postgres 9.6. It has several improvements for trigram GIN indexes.

Code Snippets

WHERE name ILIKE '%lala%'
WHERE lower(name) LIKE '%lala%'  -- pattern must be lower case, too.

Context

StackExchange Database Administrators Q#156907, answer score: 5

Revisions (0)

No revisions yet.