patternsqlMinor
Confused over encoding/locale in postgresql
Viewed 0 times
postgresqlconfusedlocaleencodingover
Problem
I am a little confused over the difference between encoding and locale as it pertains to the postgresql database.
When initializing the database, you can specify both an encoding and a locale. Correct me if I am wrong but I assume the encoding defines what the database is actually stored as on the computer.
So if I specify
I think the
From what I have read is that UTF8 can be used with ANY locale so what is the point in specifying different encodings if UTF8 is so ubiquitous and the locale is really the one specifying the specific character set to be used?
Also, on Linux, the locale can be specified like so:
When initializing the database, you can specify both an encoding and a locale. Correct me if I am wrong but I assume the encoding defines what the database is actually stored as on the computer.
So if I specify
UTF8, all the characters in the UTF8 character set would be valid characters. If I specified WIN1252, all those characters would be valid characters. What I dont get is where the locale comes into play. If I specify my encoding as UTF8 and then specify my locale as English_United States.1252, what does that exactly mean? I think the
WIN1252 character set is a subset of UTF8 so is the locale just specifying the subset of characters to be used from the UTF8 character set? From what I have read is that UTF8 can be used with ANY locale so what is the point in specifying different encodings if UTF8 is so ubiquitous and the locale is really the one specifying the specific character set to be used?
Also, on Linux, the locale can be specified like so:
en_us.utf8. So the database encoding is specified in the locale? If the encoding is specified in the locale, why even have a -encoding flag when initializing the database?Solution
The
The
The settings of particular interest for you are
In older versions, these two settings could not be changed after a database had been initialized. Since Postgres 9.1 you can at least override the collation setting when needed.
encoding defines the very basic rules how characters are represented in binary format (like @a_horse explains in his comment). It should be mentioned that the server encoding has to match the client encoding for successful communication. Postgres can translate if necessary, there is a dedicated setting client_encoding for this.The
locale is a superset of settings, which can be split up for PostgreSQL intoLC_COLLATE
LC_CTYPE
LC_MESSAGES
LC_MONETARY
LC_NUMERIC
LC_TIMEThe settings of particular interest for you are
LC_COLLATE (defines how strings are sorted) and LC_TYPE (defines the type of characters).In older versions, these two settings could not be changed after a database had been initialized. Since Postgres 9.1 you can at least override the collation setting when needed.
Code Snippets
LC_COLLATE
LC_CTYPE
LC_MESSAGES
LC_MONETARY
LC_NUMERIC
LC_TIMEContext
StackExchange Database Administrators Q#47663, answer score: 4
Revisions (0)
No revisions yet.