patternsqlMinor
Case Insensitive ORDER BY clause using COLLATE
Viewed 0 times
casecollateorderinsensitiveusingclause
Problem
I have spent a long time looking for this, and I am getting mixed messages.
In other DBMSs (tested in SQLite, Oracle, MariaDB, MSSQL) I can override the default sort order using the
I have pored over the documentation and searched high and low, but I can’t find anything so straightforward for PostgreSQL.
Is there a
I know there are many questions regarding case sensitivity, but (a) most of them are old and (b) none that I have seen relate to the
FWIW, I am testing on PostgreSQL 11.8. I have a test fiddle on http://sqlfiddle.com/#!17/05cab/1, but it’s only for PostgreSQL 9.6.
MySQL/MariaDB and SQL Server default to case insensitive, and that would certainly make sense when sorting most text. Oracle and SQLite default to case sensitive, but have a relatively simple solution for a case insensitive sort. The default collation for my database is
In other DBMSs (tested in SQLite, Oracle, MariaDB, MSSQL) I can override the default sort order using the
COLLATE clause:SELECT *
FROM orderby
ORDER BY string COLLATE … ;
-- SQLite: BINARY | NOCASE
-- MariaDB: utf8mb4_bin | utf8mb4_general_ci
-- Oracle: BINARY | BINARY_CI
-- MSSQL: Latin1_General_BIN | Latin1_General_CI_AS
I have pored over the documentation and searched high and low, but I can’t find anything so straightforward for PostgreSQL.
Is there a
COLLATE clause value that would sort Case Insensitive?I know there are many questions regarding case sensitivity, but (a) most of them are old and (b) none that I have seen relate to the
COLLATE clause.FWIW, I am testing on PostgreSQL 11.8. I have a test fiddle on http://sqlfiddle.com/#!17/05cab/1, but it’s only for PostgreSQL 9.6.
MySQL/MariaDB and SQL Server default to case insensitive, and that would certainly make sense when sorting most text. Oracle and SQLite default to case sensitive, but have a relatively simple solution for a case insensitive sort. The default collation for my database is
en_US.UTF-8. I’m trying to fill in a few gaps here.Solution
If you were using PostgreSQL 12 or newer you would be able to create a new collation via the CREATE COLLATION command that specifies
Instead of sqlfiddle, I used db<>fiddle, which offers several versions of PostgreSQL, for the example:
https://dbfiddle.uk/?rdbms=postgres_12&fiddle=a8f4b330d04266947fb11e5c04fa4891&hide=2
This solution is similar to what worked for this related answer (also on DBA.SE):
PostgreSQL nondeterministic collations are not supported for LIKE
The first four db<>fiddle example queries with an
The following statement from the Collation Support documentation for version 12 explains that
Note that while this system allows creating collations that “ignore case” or “ignore accents” or similar (using the ks key), in order for such collations to act in a truly case- or accent-insensitive manner, they also need to be declared as not deterministic in
But the same note from the same Collation Support documentation for version 11 explains that sorting is still effectively case-sensitive:
Note that while this system allows creating collations that “ignore case” or “ignore accents” or similar (using the ks key), PostgreSQL does not at the moment allow such collations to act in a truly case- or accent-insensitive manner. Any strings that compare equal according to the collation but are not byte-wise equal will be sorted according to their byte values.
Taking a closer look at the
That leaves you with two options:
The
The two main considerations are:
-
-
It sounds like
Also,
deterministic = false:CREATE COLLATION ci (provider = icu, locale = 'en-US-u-ks-level2', deterministic = false);
Instead of sqlfiddle, I used db<>fiddle, which offers several versions of PostgreSQL, for the example:
https://dbfiddle.uk/?rdbms=postgres_12&fiddle=a8f4b330d04266947fb11e5c04fa4891&hide=2
This solution is similar to what worked for this related answer (also on DBA.SE):
PostgreSQL nondeterministic collations are not supported for LIKE
The first four db<>fiddle example queries with an
ORDER BY clause show three different sort results:- the
ORDER BY "string" COLLATE "POSIX"query returns rows in closer to a binary / ordinal ordering with all upper-case (A - Z) grouped before all lower-case (a - z).
- the
ORDER BY "string"andORDER BY "string" COLLATE "en_US.utf8"queries return rows in what is effectively case-sensitive order with "A" and "a" grouped together, and lower-case of each letter coming before the upper-case of the same letter (aA bB ...).
- the
ORDER BY "string" COLLATE "ci"query returns rows in a truly case-insensitive order with "A" and "a" grouped together, but this time the cases of each particular letter are not in a predefined order and the two cases can even be intermixed within a letter. Since it is not guarantee to produce the same order each time, here is a sample of the result set that I'm currently seeing for this query:
-8 banana
-15 Banana
-9 bANana
1116 BananaThe following statement from the Collation Support documentation for version 12 explains that
deterministic needs to be set to false:Note that while this system allows creating collations that “ignore case” or “ignore accents” or similar (using the ks key), in order for such collations to act in a truly case- or accent-insensitive manner, they also need to be declared as not deterministic in
CREATE COLLATION;... Otherwise, any strings that compare equal according to the collation but are not byte-wise equal will be sorted according to their byte values.But the same note from the same Collation Support documentation for version 11 explains that sorting is still effectively case-sensitive:
Note that while this system allows creating collations that “ignore case” or “ignore accents” or similar (using the ks key), PostgreSQL does not at the moment allow such collations to act in a truly case- or accent-insensitive manner. Any strings that compare equal according to the collation but are not byte-wise equal will be sorted according to their byte values.
Taking a closer look at the
CREATE COLLATION documentation, it seems that the deterministic property (which defaults to true but needs to be set to false to do insensitive comparisons and sorting) was introduced in Version 12, which doesn't really help you. (@LaurenzAlbe recommended an alternate syntax in a comment that might work on older versions of libicu, and adjusting the db<>fiddle test linked above for that syntax and to run on PostgreSQL 11, it does not appear to work: https://dbfiddle.uk/?rdbms=postgres_11&fiddle=c3c48e111ed1837987524fee2c54a183&hide=2 )That leaves you with two options:
lower()function
- citext datatype (as mentioned by @a_horse_with_no_name in a comment on the question)
The
citext datatype calls the lower() function internally, but has the following benefits (according to the documentation, linked directly above):- it applies to implicit indexes created by
UNIQUEandPRIMARY KEYconstraints
- don't need to remember to add
lower()to both sides of a comparison
The two main considerations are:
-
citext is not as efficient as text because the operator functions and the B-tree comparison functions must make copies of the data and convert it to lower case for comparisons. It is, however, slightly more efficient than using lower to get case-insensitive matching.-
citext doesn't help much if you need data to compare case-sensitively in some contexts and case-insensitively in other contexts. The standard answer is to use the text type and manually use the lower function when you need to compare case-insensitively; this works all right if case-insensitive comparison is needed only infrequently. If you need case-insensitive behavior most of the time and case-sensitive infrequently, consider storing the data as citext and explicitly casting the column to text when you want case-sensitive comparison. In either situation, you will need two indexes if you want both types of searches to be fast.It sounds like
citext creates a lower-case copy of the data that takes up more space but is slightly faster due to the pre-calculation.Also,
citext needs to be installed. I modified the earlier example queries to make use of citext to show what can be done on PostgreSQL 11, but it seems that citext is not installed on db<>fidCode Snippets
-8 banana
-15 Banana
-9 bANana
1116 BananaContext
StackExchange Database Administrators Q#280111, answer score: 4
Revisions (0)
No revisions yet.