patternsqlMinor
case sensivity incongruity
Viewed 0 times
casesensivityincongruity
Problem
I'm trying to learn and understand collations and how postgresql compare and order strings but I'm finding an incongruity and I don't know what I'm missing.
The query:
Returns that all my DB uses en_GB.UTF-8 collation.
And the query:
Returns that all columns have 'C' collation, which means (according to my research) that no collation is specified. So, I supose that the DB collation is used, isn't?
EDIT: My mistake is here. The previous query doesn't return a result for all the columns, only for those which
Well, the incongruity comes when I run the query:
Where "name" column datatype is text and foo.bar is a user created schema.table combination. The result it's human-like alphabetically ordered. From a/A to Z/z, no matter if it's upper or lowercase.
But if I run the following query:
It orders the result in machine-like way, comparing byte by byte and therefore ordering uppercase before lowercase.
Why there is that difference? I've seen that the "table_name" column datatype isn't "text" but "name", but I can't find if it has something to do.
Having the DB collation set to en_GB.UTF-8 shouldn't be enough to compare text in a human way?
Thanks for your time.
The query:
SELECT datname, datcollate FROM pg_database;Returns that all my DB uses en_GB.UTF-8 collation.
And the query:
SELECT table_schema,
table_name,
column_name,
collation_name
FROM information_schema.columns
WHERE collation_name is not null
ORDER BY table_schema,
table_name,
ordinal_position;Returns that all columns have 'C' collation, which means (according to my research) that no collation is specified. So, I supose that the DB collation is used, isn't?
EDIT: My mistake is here. The previous query doesn't return a result for all the columns, only for those which
collation_name is not null, and the columns with null collation are the ones that inherit the DB collation. So I was thinking that foo.bar columns had C collation but was null in fact.Well, the incongruity comes when I run the query:
SELECT "name" FROM foo.bar ORDER BY "name" ASC;Where "name" column datatype is text and foo.bar is a user created schema.table combination. The result it's human-like alphabetically ordered. From a/A to Z/z, no matter if it's upper or lowercase.
But if I run the following query:
SELECT "table_name"
FROM information_schema.tables
WHERE
"table_name" ~ 'some_pattern'
AND table_schema = 'foo'
AND table_type = 'BASE TABLE'
ORDER BY "table_name" ASC;It orders the result in machine-like way, comparing byte by byte and therefore ordering uppercase before lowercase.
Why there is that difference? I've seen that the "table_name" column datatype isn't "text" but "name", but I can't find if it has something to do.
Having the DB collation set to en_GB.UTF-8 shouldn't be enough to compare text in a human way?
Thanks for your time.
Solution
The collation in PostgreSQL is determined as follows (simplified, for details see the documentation):
-
if there is an explicit
-
otherwise, if a table column was defines with a certain collation, that collation is used
-
if the column is not defined with a collation (it has the “default collation”), the database collation is used.
Now the column
But
To change that, you should explicitly specify the collation:
-
if there is an explicit
COLLATE clause, that determines the collation-
otherwise, if a table column was defines with a certain collation, that collation is used
-
if the column is not defined with a collation (it has the “default collation”), the database collation is used.
Now the column
bar.name was not defined with a collation, so en_GB.UTF-8 is used.But
information_schema.columns has columns of type information_schema.sql_identifier, which is a domain over the data type name, which always uses the C collation. So upper case letters are sorted before lower case letters in that query.To change that, you should explicitly specify the collation:
ORDER BY table_schema COLLATE "en_GB.UTF-8",
table_name COLLATE "en_GB.UTF-8",
ordinal_position;Code Snippets
ORDER BY table_schema COLLATE "en_GB.UTF-8",
table_name COLLATE "en_GB.UTF-8",
ordinal_position;Context
StackExchange Database Administrators Q#298651, answer score: 5
Revisions (0)
No revisions yet.