patternsqlMinor
Having charset "utf8mb4" - does it mean every character takes 4 bytes, or only those that need 4 bytes?
Viewed 0 times
thoseneedmeanhavingeverycharacterthatbytesdoesutf8mb4
Problem
I have a
From reading in different places, my understanding is that each character on a
Is it correct that a row with "aaaa" contents would take less storage than a row with ""?
VARCHAR(80) utf8mb4 column, and I'm testing adding some ASCII & Emoji characters, and use LENGTH() and CHAR_LENGTH() to understand the differences.From reading in different places, my understanding is that each character on a
utf8mb4 column would take 4 bytes. However, it appears that it may not be the case, if I understand LENGTH() as giving me the actual size that the specific contents are taking on that field.Is it correct that a row with "aaaa" contents would take less storage than a row with ""?
Solution
Actually, my answer is in the Documentation:
https://mariadb.com/kb/en/unicode/
UTF-8 encoding using one to three bytes per character. Basic Latin
letters, numbers and punctuation use one byte. European and Middle
East letters mostly fit into 2 bytes. Korean, Chinese, and Japanese
ideographs use 3-bytes. No supplementary characters are stored.
utf8mb4 -- Same as utf8, but stores supplementary characters in four
bytes.
Also, as per https://mariadb.com/kb/en/data-type-storage-requirements/
String Data Types
In the descriptions below, M is the declared column length (in characters or in bytes), while len is the actual length in bytes of the value.
VARCHAR(M) -- len + 1 bytes if column is 0 – 255 bytes, len + 2 bytes if column may require more than 255 bytes
So, for
80x4 > 255, so a field with "aaaa" will take 6 bytes (len + 2)
https://mariadb.com/kb/en/unicode/
UTF-8 encoding using one to three bytes per character. Basic Latin
letters, numbers and punctuation use one byte. European and Middle
East letters mostly fit into 2 bytes. Korean, Chinese, and Japanese
ideographs use 3-bytes. No supplementary characters are stored.
utf8mb4 -- Same as utf8, but stores supplementary characters in four
bytes.
Also, as per https://mariadb.com/kb/en/data-type-storage-requirements/
String Data Types
In the descriptions below, M is the declared column length (in characters or in bytes), while len is the actual length in bytes of the value.
VARCHAR(M) -- len + 1 bytes if column is 0 – 255 bytes, len + 2 bytes if column may require more than 255 bytes
So, for
VARCHAR(80) utf8mb4,80x4 > 255, so a field with "aaaa" will take 6 bytes (len + 2)
Context
StackExchange Database Administrators Q#285336, answer score: 7
Revisions (0)
No revisions yet.