patternsqlMinor
Remove characters from PostgreSQL database causing encoding errors
Viewed 0 times
postgresqlremovedatabasecausingencodingcharacterserrorsfrom
Problem
I have a PostgreSQL 10 database that uses WIN1252 encoding.
One of my columns has values that cause conversion errors when running a select from pgAdmin 4:
Returns:
Setting the client encoding to WIN1252 in pgAdmin 4 causes it to drop the connection to the database (I remember reading somewhere that pgAdmin 4 has issues with setting the client encoding, but track down the link).
Using psql and setting the client encoding to WIN1252, selecting one of the offending features:
Returns:
Looking at the data that was originally imported into the DB, it appears that “curly quotes” were incorrectly imported.
Is there a way to remove these characters from the column - either replacing them with regular quotation marks, or simply deleting them?
One of my columns has values that cause conversion errors when running a select from pgAdmin 4:
SELECT myfield FROM mydb.myschema."MYTABLE"Returns:
ERROR: character with byte sequence 0x9d in encoding "WIN1252" has no
equivalent in encoding "UTF8"
SQL state: 22P05Setting the client encoding to WIN1252 in pgAdmin 4 causes it to drop the connection to the database (I remember reading somewhere that pgAdmin 4 has issues with setting the client encoding, but track down the link).
Using psql and setting the client encoding to WIN1252, selecting one of the offending features:
SELECT myfield
FROM mydb.myschema."MYTABLE"
WHERE oid = 12345Returns:
“A sample commentLooking at the data that was originally imported into the DB, it appears that “curly quotes” were incorrectly imported.
Is there a way to remove these characters from the column - either replacing them with regular quotation marks, or simply deleting them?
Solution
Looking at the WIN1252 character set:
https://en.wikipedia.org/wiki/Windows-1252 , there is no character corresponding to that hexadecimal code
That's why it "has not equivalent" in UTF8: that character doesn't exist in the first place in the source map.
It doesn't error out when displayed or input in a session configured in
Anyway, you might want to remove these with statements like (when using the WIN1252 client encoding):
Use psql if WIN1252 is unusable with PgAdmin. Once these characters are removed, you'll be able to switch later to UTF8 client encoding.
https://en.wikipedia.org/wiki/Windows-1252 , there is no character corresponding to that hexadecimal code
9D. The same goes for 81, 8D, 8F, 90.That's why it "has not equivalent" in UTF8: that character doesn't exist in the first place in the source map.
It doesn't error out when displayed or input in a session configured in
WIN1252 encoding. When using the proper code page (chcp 1252) with psql.exe on top of cmd.exe, these characters are displayed in my environment as a question mark inside a rectangular box.Anyway, you might want to remove these with statements like (when using the WIN1252 client encoding):
UPDATE tablename SET field=replace(field, chr(x'9D'::int), '')
WHERE strpos(field,chr(x'9D'::int))>0;Use psql if WIN1252 is unusable with PgAdmin. Once these characters are removed, you'll be able to switch later to UTF8 client encoding.
Code Snippets
UPDATE tablename SET field=replace(field, chr(x'9D'::int), '')
WHERE strpos(field,chr(x'9D'::int))>0;Context
StackExchange Database Administrators Q#194857, answer score: 4
Revisions (0)
No revisions yet.