patternsqlMinor
Does latin1 have performance benefits over utf8?
Viewed 0 times
utf8benefitsperformancedoeslatin1overhave
Problem
I have a table in
MySQL doc says:
To save space with UTF-8, use VARCHAR instead of CHAR. Otherwise, MySQL must reserve three bytes for each character in a CHAR CHARACTER SET utf8 column because that is the maximum possible character length. For example, MySQL must reserve 30 bytes for a CHAR(10) CHARACTER SET utf8 column.
I made a test - created 2 tables with the same 50M records:
but MySQL says that they have almost the same size:
Why does it so?
P.S: I made the same test with MyISAM and got expected benefit: table with latin1 - 383Mb, utf8 - 1Gb. But why it does not work for InnoDB?
utf8 with > 80M records and one of the columns (char(6) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL) can contain just latin symbols ([a-zA-Z0-9]). Does it have the sense to convert this column into latin1?MySQL doc says:
To save space with UTF-8, use VARCHAR instead of CHAR. Otherwise, MySQL must reserve three bytes for each character in a CHAR CHARACTER SET utf8 column because that is the maximum possible character length. For example, MySQL must reserve 30 bytes for a CHAR(10) CHARACTER SET utf8 column.
I made a test - created 2 tables with the same 50M records:
CREATE TABLE `t_utf8` (
`c_1` char(6) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
UNIQUE KEY `index_t_utf8_on_c_1` (`c_1`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT;
CREATE TABLE `t_lat` (
`c_1` char(6) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
UNIQUE KEY `index_t_lat_on_c_1` (`c_1`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT;but MySQL says that they have almost the same size:
Name: t_lat
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 57557700
Avg_row_length: 30
Data_length: 1741668352
Max_data_length: 0
Index_length: 0
Data_free: 2097152
Auto_increment: NULL
Collation: utf8_general_ci
Create_options: row_format=COMPACT
Name: t_utf8
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 57554528
Avg_row_length: 31
Data_length: 1810874368
Max_data_length: 0
Index_length: 0
Data_free: 3145728
Auto_increment: NULL
Collation: utf8_general_ci
Create_options: row_format=COMPACTWhy does it so?
- MySQL 5.7
- InnoDB
P.S: I made the same test with MyISAM and got expected benefit: table with latin1 - 383Mb, utf8 - 1Gb. But why it does not work for InnoDB?
Solution
Short answer: Don't bother.
Long answer:
Do not use
Get in the habit of explicit saying ascii or utf8mb4 when you create the column/table unless you have an unusual case where you need something else.
Long answer:
- https://bugs.mysql.com/bug.php?id=84440 -- you may or may not save space.
- InnoDB is inherently bigger than MyISAM.
- Comparing characters in utf8 is slightly slower than in latin1. But you will probably not notice.
- The tiny difference between 1741668352 abd 1810874368 is probably due to the random nature of how you build one table from the other. Or the phase of the moon, not something significant.
- The 30 vs 31 comes from how InnoDB estimates things.
Do not use
CHAR except for truly fixed-length strings. Almost always they are ascii, such as country_code, postal_code, UUID, hex, md5, etc.Get in the habit of explicit saying ascii or utf8mb4 when you create the column/table unless you have an unusual case where you need something else.
Context
StackExchange Database Administrators Q#161045, answer score: 5
Revisions (0)
No revisions yet.