patternsqlModerate

MySQL Collation utf8_unicode differences

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

utf8_unicodedifferencesmysqlcollation

Problem

But I've been reading up on the importance of MySQL Collation and what I have learned so far regarding compatibility and accuracy is these 4 seem to be my best bet.

utf8_unicode_ci

utf8_unicode_520_ci

utf8mb4_unicode_ci

utf8mb4_unicode_520_ci

From my understanding uft8mb4 would be good for character with mutli language (character) support (japanese for example). uf8 only supports 3 bytes while uf8mb4 supports 4 bytes. So it sounds like the obvious choice would be uf8mb4, but the catch seems to be that you have a length limit (Damn it! I want my cake and eat it too), which is a little concern (I think).

Then you take into account about the 520 standard; which offers more, from what little I could find on it. But that is of course the issue, I could find very little about it. Only that people say it's an improvement, yet being very vague on how that is.

I do want the most I can get with as few restrictions as possible... I figured someone here may know a thing or two. The official MySQL site wasn't as informative as I had hoped it was.

Of the 4 which would offer the most compatibility, accuracy, and the most storage length? Also what truly is the big difference between the 502 standard?

Solution

Unicode collation names may include a version number to indicate the version of the Unicode Collation Algorithm (UCA) on which the collation is based. UCA-based collations without a version number in the name use the version-4.0.0 UCA weight keys. A collation name such as utf8_unicode_520_ci is based on UCA 5.2.0 weight keys.

See https://dev.mysql.com/doc/refman/5.6/en/charset-collation-names.html.

Context

StackExchange Database Administrators Q#65863, answer score: 11

Revisions (0)

No revisions yet.