patternsqlModerate
How exactly does the one-byte "char" type work in PostgreSQL?
Viewed 0 times
postgresqlthebytechartypeoneworkdoeshowexactly
Problem
I often see people talking about
The type "char" (note the quotes) is different from char(1) in that it only uses one byte of storage. It is internally used in the system catalogs as a simplistic enumeration type.
And further,
So, if it's one byte, what is the domain and how would you make use of it? Is it signed or unsigned? In this post by @Erwin Brandstetter he lays it out, but I'm still confused. He's using
That's doing something really weird between 10 and 11.
It also gets really weird here:
Why is everything north of 128 being decoded as 128? But to take the bizzare up a bit, after 192 there is a switch and they get decoded as 192..
Erwin says
There are several characters not meant for display. So encode before you store and decode before you display ...
I'm not sure why we should encode at all though if we're doing exactly what that questions asks
That works fine. We can get the ints back out using
"char". I've never used it. It's defined in the docs as,The type "char" (note the quotes) is different from char(1) in that it only uses one byte of storage. It is internally used in the system catalogs as a simplistic enumeration type.
And further,
"char" 1 byte single-byte internal typeSo, if it's one byte, what is the domain and how would you make use of it? Is it signed or unsigned? In this post by @Erwin Brandstetter he lays it out, but I'm still confused. He's using
ascii() and chr(), and provides thisSELECT i
, chr(i)::"char" AS i_encoded
, ascii(chr(i)::"char") AS i_decoded
FROM generate_series(1,256) i;That's doing something really weird between 10 and 11.
i | i_encoded | i_decoded
-----+-----------+-----------
...
8 | \x08 | 8
9 | | 9
10 | +| 10
| | -- WTF is going on here.
11 | \x0B | 11
12 | \x0C | 12
...It also gets really weird here:
126 | ~ | 126
127 | \x7F | 127
128 | | 128
129 | | 128
130 | | 128
131 | | 128Why is everything north of 128 being decoded as 128? But to take the bizzare up a bit, after 192 there is a switch and they get decoded as 192..
190 | | 128
191 | | 128
192 | | 192
193 | | 192
194 | | 192
195 | | 192
196 | | 192
197 | | 192Erwin says
There are several characters not meant for display. So encode before you store and decode before you display ...
I'm not sure why we should encode at all though if we're doing exactly what that questions asks
CREATE TABLE foo AS
SELECT i::"char"
FROM generate_series(-128,127) i;That works fine. We can get the ints back out using
Solution
chr(10)
... produces the LINEFEED character (a.k.a. escape sequence
\n) and psql displays the character with a newline (indicated by +). Everything correct there.- & 3.
ascii()produces 128 or 192?
It starts with a mistake I made. I carelessly assumed
"char" would cover the range of an unsigned 1-byte integer (0 to 255) in the referenced answer (now fixed), but it's actually the range of a signed 1-byte integer (-128 to 127) internally.ascii() takes a text parameter, the implicit cast from "char" to text produces a multibyte-encoded character in unicode, and the function returns (per documentation on ascii()):ASCII code of the first character of the argument. For UTF8 returns the Unicode code point of the character. For other multibyte encodings, the argument must be an ASCII character.
So we get a lot of truncated values. 128 and 192 are byte values for the leading byte of multibyte characters.
- The null byte
The inability to store null bytes only affects regular character types (
text, char, varchar), not "char". It applies to my buggy example, because I cast to text as stepping stone. While casting between "char" and integer directly, the limitation does not apply. The manual on chr():The NULL (0) character is not allowed because text data types cannot
store such bytes.
Not so for "char", where
0 is mapped to the empty string '':SELECT ''::"char"::int -- 0
, 0::"char" = ''; -- tRemember:
"char" is still an "internal" type intended for simple and cheap enumeration. Not officially designed for what we are doing here, and not portable to other RDBMS. There are no guarantees by the Postgres project for this.Code Snippets
SELECT ''::"char"::int -- 0
, 0::"char" = ''; -- tContext
StackExchange Database Administrators Q#166155, answer score: 12
Revisions (0)
No revisions yet.