patternsqlMinor
No NULLs, yet invalid byte sequence for encoding "UTF8": 0x00
Viewed 0 times
byteutf80x00sequenceyetforencodinginvalidnulls
Problem
I've spent the last 8 hours trying to import the output of 'mysqldump --compatible=postgresql' into PostgreSQL 8.4.9, and I've read at least 20 different threads here and elesewhere already about this specific problem, but found no real usable answer that works.
MySQL 5.1.52 data dumped:
PostgreSQL 8.4.9 server as destination
Loading the data with 'psql -U rt_user -f foo' is reporting (many of these, here's one example):
According the following, there are no NULL (0x00) characters in the input file.
Likewise, another check with Perl shows no NULLs:
As the "HINT" in the error mentions, I have tried every possible way to set 'client_encoding' to 'UTF8', and I succeed but it has no effect toward solving my problem.
Perfect, yet:
Barring the "According to Hoyle" correct answer, which would be fantastic to hear, and knowing that
MySQL 5.1.52 data dumped:
mysqldump -u root -p --compatible=postgresql --no-create-info --no-create-db --default-character-set=utf8 --skip-lock-tables rt3 > fooPostgreSQL 8.4.9 server as destination
Loading the data with 'psql -U rt_user -f foo' is reporting (many of these, here's one example):
psql:foo:29: ERROR: invalid byte sequence for encoding "UTF8": 0x00
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".According the following, there are no NULL (0x00) characters in the input file.
database-dumps:rcf-temp1# sed 's/\x0/ /g' nonulls
database-dumps:rcf-temp1# sum foo nonulls
04730 2545610 foo
04730 2545610 nonulls
database-dumps:rcf-temp1# rm nonullsLikewise, another check with Perl shows no NULLs:
database-dumps:rcf-temp1# perl -ne '/\000/ and print;' foo
database-dumps:rcf-temp1#As the "HINT" in the error mentions, I have tried every possible way to set 'client_encoding' to 'UTF8', and I succeed but it has no effect toward solving my problem.
database-dumps:rcf-temp1# psql -U rt_user --variable=client_encoding=utf-8 -c "SHOW client_encoding;" rt3
client_encoding
-----------------
UTF8
(1 row)
database-dumps:rcf-temp1#Perfect, yet:
database-dumps:rcf-temp1# psql -U rt_user -f foo --variable=client_encoding=utf-8 rt3
...
psql:foo:29: ERROR: invalid byte sequence for encoding "UTF8": 0x00
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
...Barring the "According to Hoyle" correct answer, which would be fantastic to hear, and knowing that
Solution
I had the same problem using MySQL version 5.0.51 and Postgres version 9.3.4.0. I solved the "invalid byte sequence for encoding "UTF8": 0x00" issue after seeing Daniel Vérité's comment that "mysqldump in postgresql mode will dump null bytes as \0 in strings, so you probably want to search for that sequence of characters."
Sure enough a grep finally revealed the NULL chars.
I replaced the NULL chars using the following command
Postgres was then able to successfully load dump.sql
Sure enough a grep finally revealed the NULL chars.
grep \\\\0 dump.sqlI replaced the NULL chars using the following command
sed -i BAK 's/\\0//g' dump.sqlPostgres was then able to successfully load dump.sql
Code Snippets
grep \\\\0 dump.sqlsed -i BAK 's/\\0//g' dump.sqlContext
StackExchange Database Administrators Q#9792, answer score: 7
Revisions (0)
No revisions yet.