HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

case sensivity incongruity

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
casesensivityincongruity

Problem

I'm trying to learn and understand collations and how postgresql compare and order strings but I'm finding an incongruity and I don't know what I'm missing.

The query:

SELECT datname, datcollate FROM pg_database;


Returns that all my DB uses en_GB.UTF-8 collation.

And the query:

SELECT table_schema, 
   table_name, 
   column_name,
   collation_name
FROM information_schema.columns
WHERE collation_name is not null
ORDER BY table_schema,
     table_name,
     ordinal_position;


Returns that all columns have 'C' collation, which means (according to my research) that no collation is specified. So, I supose that the DB collation is used, isn't?

EDIT: My mistake is here. The previous query doesn't return a result for all the columns, only for those which collation_name is not null, and the columns with null collation are the ones that inherit the DB collation. So I was thinking that foo.bar columns had C collation but was null in fact.

Well, the incongruity comes when I run the query:

SELECT "name" FROM foo.bar ORDER BY "name" ASC;


Where "name" column datatype is text and foo.bar is a user created schema.table combination. The result it's human-like alphabetically ordered. From a/A to Z/z, no matter if it's upper or lowercase.

But if I run the following query:

SELECT "table_name"
FROM information_schema.tables
WHERE
    "table_name" ~ 'some_pattern'
    AND table_schema = 'foo'
    AND table_type = 'BASE TABLE'
ORDER BY "table_name" ASC;


It orders the result in machine-like way, comparing byte by byte and therefore ordering uppercase before lowercase.

Why there is that difference? I've seen that the "table_name" column datatype isn't "text" but "name", but I can't find if it has something to do.

Having the DB collation set to en_GB.UTF-8 shouldn't be enough to compare text in a human way?

Thanks for your time.

Solution

The collation in PostgreSQL is determined as follows (simplified, for details see the documentation):

-
if there is an explicit COLLATE clause, that determines the collation

-
otherwise, if a table column was defines with a certain collation, that collation is used

-
if the column is not defined with a collation (it has the “default collation”), the database collation is used.

Now the column bar.name was not defined with a collation, so en_GB.UTF-8 is used.

But information_schema.columns has columns of type information_schema.sql_identifier, which is a domain over the data type name, which always uses the C collation. So upper case letters are sorted before lower case letters in that query.

To change that, you should explicitly specify the collation:

ORDER BY table_schema COLLATE "en_GB.UTF-8",
         table_name COLLATE "en_GB.UTF-8",
         ordinal_position;

Code Snippets

ORDER BY table_schema COLLATE "en_GB.UTF-8",
         table_name COLLATE "en_GB.UTF-8",
         ordinal_position;

Context

StackExchange Database Administrators Q#298651, answer score: 5

Revisions (0)

No revisions yet.