snippetsqlMinor
How to convert control characters in MySQL from latin1 to UTF-8?
Viewed 0 times
controlutfconvertmysqlhowcharacterslatin1from
Problem
While converting a database to UTF-8 I noticed a strange behavior regarding the control characters 0x80-0x9F. For example, 0x92 (right apostrophe) would not get converted to UTF-8 and truncate the rest of the content of a column, using this method:
While normally, 0x80-0x9F wouldn't be allowed in Latin1, MySQL seems to handle it differently:
MySQL's latin1 is the same as the Windows cp1252 character set. This means it is the same as the official ISO 8859-1 or IANA (Internet Assigned Numbers A
CREATE TABLE `bar` (
`content` text
) ENGINE=MyISAM DEFAULT CHARSET=latin1
INSERT INTO bar VALUES (0x8081828384858687898A8B8C8D8E8F909192939495969798999A9B9C9D9E9F);
Query OK, 1 row affected (0.06 sec)
SELECT content FROM bar;
+---------------------------------------------------------------------------------+
| content |
+---------------------------------------------------------------------------------+
| €‚ƒ„…†‡‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ |
+---------------------------------------------------------------------------------+
1 row in set (0.06 sec)
ALTER TABLE bar CHANGE content content TEXT CHARACTER SET UTF8;
Query OK, 1 row affected, 1 warning (0.06 sec)
Records: 1 Duplicates: 0 Warnings: 1
SHOW WARNINGS;
+---------+------+-------------------------------------------------------------------------------------+
| Level | Code | Message |
+---------+------+-------------------------------------------------------------------------------------+
| Warning | 1366 | Incorrect string value: '\x80\x81\x82\x83\x84\x85...' for column 'content' at row 1 |
+---------+------+-------------------------------------------------------------------------------------+
1 row in set (0.06 sec)
SELECT * FROM bar;
+---------+
| content |
+---------+
| |
+---------+
1 row in set (0.06 sec)While normally, 0x80-0x9F wouldn't be allowed in Latin1, MySQL seems to handle it differently:
MySQL's latin1 is the same as the Windows cp1252 character set. This means it is the same as the official ISO 8859-1 or IANA (Internet Assigned Numbers A
Solution
I'm not certain. I tried to start out be reproducing your problem but the alter worked fine for me.
Here's my related char settings
Edit
My char settings before running set names utf8
Version
test > CREATE TABLE `bar` ( `content` text ) ENGINE=MyISAM DEFAULT CHARSET=latin1; INSERT INTO bar VALUES (0x8081828384858687898A8B8C8D8E8F909192939495969798999A9B9C9D9E9F);
Query OK, 0 rows affected (0.02 sec)
Query OK, 1 row affected (0.00 sec)
test > ALTER TABLE bar CHANGE content content TEXT CHARACTER SET UTF8;
Query OK, 1 row affected (0.04 sec)
Records: 1 Duplicates: 0 Warnings: 0
test > select * from bar;
+---------------------------------+
| content |
+---------------------------------+
| ����������������������������� |
+---------------------------------+
1 row in set (0.00 sec)
test > set names utf8;
Query OK, 0 rows affected (0.00 sec)
test > select * from bar;
+---------------------------------------------------------------------------------+
| content |
+---------------------------------------------------------------------------------+
| €‚ƒ„…†‡‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ |
+---------------------------------------------------------------------------------+
1 row in set (0.00 sec)Here's my related char settings
test > show variables like '%char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+Edit
My char settings before running set names utf8
test > show variables like '%char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)Version
test > select version();
+-------------------------+
| version() |
+-------------------------+
| 5.1.41-3ubuntu12.10-log |
+-------------------------+
1 row in set (0.00 sec)Code Snippets
test > CREATE TABLE `bar` ( `content` text ) ENGINE=MyISAM DEFAULT CHARSET=latin1; INSERT INTO bar VALUES (0x8081828384858687898A8B8C8D8E8F909192939495969798999A9B9C9D9E9F);
Query OK, 0 rows affected (0.02 sec)
Query OK, 1 row affected (0.00 sec)
test > ALTER TABLE bar CHANGE content content TEXT CHARACTER SET UTF8;
Query OK, 1 row affected (0.04 sec)
Records: 1 Duplicates: 0 Warnings: 0
test > select * from bar;
+---------------------------------+
| content |
+---------------------------------+
| ����������������������������� |
+---------------------------------+
1 row in set (0.00 sec)
test > set names utf8;
Query OK, 0 rows affected (0.00 sec)
test > select * from bar;
+---------------------------------------------------------------------------------+
| content |
+---------------------------------------------------------------------------------+
| €‚ƒ„…†‡‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ |
+---------------------------------------------------------------------------------+
1 row in set (0.00 sec)test > show variables like '%char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+test > show variables like '%char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)test > select version();
+-------------------------+
| version() |
+-------------------------+
| 5.1.41-3ubuntu12.10-log |
+-------------------------+
1 row in set (0.00 sec)Context
StackExchange Database Administrators Q#10467, answer score: 4
Revisions (0)
No revisions yet.