debugpythonModeratepending
Debug: Python UnicodeDecodeError when reading files
Viewed 0 times
UnicodeDecodeErrorencodingutf-8latin-1chardetBOM
Error Messages
Problem
Python throws UnicodeDecodeError when reading a file, usually because the file encoding does not match the expected encoding.
Solution
Diagnosis and fixes:
# Using chardet
import chardet
with open('file.txt', 'rb') as f:
result = chardet.detect(f.read())
print(result) # {'encoding': 'ISO-8859-1', 'confidence': 0.73}
# Using file command
# file -I filename.txt
with open('file.txt', encoding='utf-8') as f: ...
with open('file.txt', encoding='latin-1') as f: ... # Never fails
with open('file.txt', encoding='cp1252') as f: ... # Windows
with open('file.txt', encoding='utf-8', errors='replace') as f: ...
# errors='replace' replaces bad chars with ?
# errors='ignore' skips bad chars
# errors='backslashreplace' shows \xNN
- Modern: UTF-8
- Windows: cp1252 (Western), cp1251 (Cyrillic)
- Legacy web: ISO-8859-1 / latin-1
- Excel CSV export: cp1252 or UTF-8 with BOM (utf-8-sig)
with open('file.csv', encoding='utf-8-sig') as f:
reader = csv.reader(f)
- Detect file encoding:
# Using chardet
import chardet
with open('file.txt', 'rb') as f:
result = chardet.detect(f.read())
print(result) # {'encoding': 'ISO-8859-1', 'confidence': 0.73}
# Using file command
# file -I filename.txt
- Read with correct encoding:
with open('file.txt', encoding='utf-8') as f: ...
with open('file.txt', encoding='latin-1') as f: ... # Never fails
with open('file.txt', encoding='cp1252') as f: ... # Windows
- Handle errors gracefully:
with open('file.txt', encoding='utf-8', errors='replace') as f: ...
# errors='replace' replaces bad chars with ?
# errors='ignore' skips bad chars
# errors='backslashreplace' shows \xNN
- Common encodings by source:
- Modern: UTF-8
- Windows: cp1252 (Western), cp1251 (Cyrillic)
- Legacy web: ISO-8859-1 / latin-1
- Excel CSV export: cp1252 or UTF-8 with BOM (utf-8-sig)
- CSV with BOM:
with open('file.csv', encoding='utf-8-sig') as f:
reader = csv.reader(f)
Revisions (0)
No revisions yet.