HiveBrain v1.2.0
Get Started
← Back to all entries
debugMinor

Unicode error with \u0000 on COPY of large JSON file into Postgres

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
errorfilepostgreswithintounicodelargejsonu0000copy

Problem

I'm using Postgres 9.4 on Ubuntu 14.04.

I have a 30GB JSON file that I'm trying to COPY into Postgres.

But I keep getting the following error:

COPY (comment_jsonb) FROM '' WITH (format csv, quote e'\x01', delimiter e'\x02', escape e'\x01');
ERROR:  unsupported Unicode escape sequence
DETAIL:  \u0000 cannot be converted to text.


I tried searching for the offending Unicode string to remove it, but because it's \u0000, which is NULL, I can't seem to get a regex to work. And when I print it into console, I have no idea where the NULL is, because (at least my guess is) it seems to print as nothing (though I have no idea what it is actually doing).

Is there a way to skip these errors?

Alternatively, how could I replace any instances of that Unicode in my JSON file?

Solution

In a (perhaps) similar case I had been confronted with \u0000 in a string I needed to process as JSON. The follow replace worked for me:

regexp_replace(stringWithNull, '\\u0000', '', 'g')


Hope this helps.

Code Snippets

regexp_replace(stringWithNull, '\\u0000', '', 'g')

Context

StackExchange Database Administrators Q#115029, answer score: 5

Revisions (0)

No revisions yet.