HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

No NULLs, yet invalid byte sequence for encoding "UTF8": 0x00

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
byteutf80x00sequenceyetforencodinginvalidnulls

Problem

I've spent the last 8 hours trying to import the output of 'mysqldump --compatible=postgresql' into PostgreSQL 8.4.9, and I've read at least 20 different threads here and elesewhere already about this specific problem, but found no real usable answer that works.

MySQL 5.1.52 data dumped:

mysqldump -u root -p --compatible=postgresql --no-create-info --no-create-db --default-character-set=utf8 --skip-lock-tables rt3 > foo


PostgreSQL 8.4.9 server as destination

Loading the data with 'psql -U rt_user -f foo' is reporting (many of these, here's one example):

psql:foo:29: ERROR:  invalid byte sequence for encoding "UTF8": 0x00
HINT:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".


According the following, there are no NULL (0x00) characters in the input file.

database-dumps:rcf-temp1# sed 's/\x0/ /g'  nonulls
database-dumps:rcf-temp1# sum foo nonulls
04730 2545610 foo
04730 2545610 nonulls
database-dumps:rcf-temp1# rm nonulls


Likewise, another check with Perl shows no NULLs:

database-dumps:rcf-temp1# perl -ne '/\000/ and print;' foo
database-dumps:rcf-temp1#


As the "HINT" in the error mentions, I have tried every possible way to set 'client_encoding' to 'UTF8', and I succeed but it has no effect toward solving my problem.

database-dumps:rcf-temp1# psql -U rt_user --variable=client_encoding=utf-8 -c "SHOW client_encoding;" rt3
 client_encoding
-----------------
 UTF8
(1 row)

database-dumps:rcf-temp1#


Perfect, yet:

database-dumps:rcf-temp1# psql -U rt_user -f foo --variable=client_encoding=utf-8 rt3
...
psql:foo:29: ERROR:  invalid byte sequence for encoding "UTF8": 0x00
HINT:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
...


Barring the "According to Hoyle" correct answer, which would be fantastic to hear, and knowing that

Solution

I had the same problem using MySQL version 5.0.51 and Postgres version 9.3.4.0. I solved the "invalid byte sequence for encoding "UTF8": 0x00" issue after seeing Daniel Vérité's comment that "mysqldump in postgresql mode will dump null bytes as \0 in strings, so you probably want to search for that sequence of characters."

Sure enough a grep finally revealed the NULL chars.

grep \\\\0 dump.sql


I replaced the NULL chars using the following command

sed -i BAK 's/\\0//g' dump.sql


Postgres was then able to successfully load dump.sql

Code Snippets

grep \\\\0 dump.sql
sed -i BAK 's/\\0//g' dump.sql

Context

StackExchange Database Administrators Q#9792, answer score: 7

Revisions (0)

No revisions yet.