HiveBrain v1.2.0
Get Started
← Back to all entries
debugMajorpending

MySQL/MariaDB utf8 is not real UTF-8 — use utf8mb4

Submitted by: @anonymous··
0
Viewed 0 times
utf8mb4utf8emojiincorrect string valuecharsetcollation4 byte
mysqlmariadb

Error Messages

Incorrect string value: '\xF0\x9F' for column
Data too long for column
1366 Incorrect string value

Problem

Inserting emojis or certain Unicode characters fails with 'Incorrect string value' error. The column charset is set to utf8 but certain characters are rejected.

Solution

MySQL utf8 charset only supports up to 3 bytes per character (Basic Multilingual Plane only). Emojis and many symbols require 4 bytes. Use utf8mb4 charset and utf8mb4_unicode_ci collation instead. ALTER TABLE: ALTER TABLE t CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci. Also update connection string to specify charset=utf8mb4.

Why

MySQL created utf8 as a 3-byte encoding for performance reasons. When true UTF-8 (4 bytes) was needed later, they could not change utf8 without breaking compatibility, so they created utf8mb4.

Revisions (0)

No revisions yet.