HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

issues with encoding and pg_dump/restore between windows and linux

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
pg_dumpissueswithbetweenwindowsencodingandlinuxrestore

Problem

I have a dmp file I have created using pg_dump encoded in WIN1252. What I would like to do is import it using pg_restore to a database on a Linux machine. The problem is that on the windows machine postgreSQL recognizes the encoding WIN1252 but not the CP1252 encoding (they both are identical just different names for the same standard). On the Linux machine, postgreSQL recognizes CP1252 but not WIN1252. So when I try and do a pg_restore I get an error because WIN1252 is not recognized as a proper encoding. Also, if I try to do a pg_dump on windows and specify the encoding as CP1252, I get a similar error. I would like to stick with this encoding format so doing a pg_dump in utf8 or something is not something I would like to do. Any help would be appreciated!

EDIT:
Actually, upon further investigation, postgreSQL does not understand cp1252. The iconv linux command used to convert a text file from one encoding to another understands cp1252 and not win1252 and thats where I got it from. On the Linux machine, I still cannot set the client encoding to WIN1252. This is the error message:

FATAL:  conversion between WIN1252 and LATIN1 is not supported


The variable client_encoding was originally set to UTF8 on the Linux machines so I dont know where LATIN1 is coming from. Unless WIN1252 is not supported so postgres tries to convert it to latin1 because that is the closest thing to it and fails at achieving that due to some character miss-match?

Solution

I'm late to the party, but I think that the accepted answer doesn't tell the whole story.

If you're using the syntax
pg_dump ... > out.sql

, then it's actually the shell that's writing the file to disk, not pg_dump. Sometimes the shell in Windows will "helpfully" translate it into another charset as it writes the file. I found that cmd.exe on Windows 11 was writing my output to some form of UTF-16 (not sure if it was LE or BE) instead of the UTF-8 that I had specified with -E. As a result the file was double the size I was expecting!

To be really safe, it's probably best to use the -E and the -f option to specify the filename:

pg_dump -E UTF-8 -f out.sql


That way, the shell redirection (and possible charset translation) is bypassed, and pg_dump writes directly to the output file and hence is in complete control of the encoding process.

As a bonus, the shell's charset translation is also an extra performance overhead, so writing directly to the file might be slightly faster too.

Code Snippets

pg_dump ... > out.sql
pg_dump -E UTF-8 -f out.sql

Context

StackExchange Database Administrators Q#35112, answer score: 8

Revisions (0)

No revisions yet.