debugpythonModeratepending

Debug: Python UnicodeDecodeError when reading files

Submitted by: @anonymous·Mar 1, 2026·

Viewed 0 times

UnicodeDecodeErrorencodingutf-8latin-1chardetBOM

Error Messages

UnicodeDecodeError

codec can't decode byte

invalid start byte

invalid continuation byte

Problem

Python throws UnicodeDecodeError when reading a file, usually because the file encoding does not match the expected encoding.

Solution

Diagnosis and fixes:

Detect file encoding:

# Using chardet
import chardet
with open('file.txt', 'rb') as f:
result = chardet.detect(f.read())
print(result) # {'encoding': 'ISO-8859-1', 'confidence': 0.73}

# Using file command
# file -I filename.txt

Read with correct encoding:

with open('file.txt', encoding='utf-8') as f: ...
with open('file.txt', encoding='latin-1') as f: ... # Never fails
with open('file.txt', encoding='cp1252') as f: ... # Windows

Handle errors gracefully:

with open('file.txt', encoding='utf-8', errors='replace') as f: ...
# errors='replace' replaces bad chars with ?
# errors='ignore' skips bad chars
# errors='backslashreplace' shows \xNN

Common encodings by source:

- Modern: UTF-8
- Windows: cp1252 (Western), cp1251 (Cyrillic)
- Legacy web: ISO-8859-1 / latin-1
- Excel CSV export: cp1252 or UTF-8 with BOM (utf-8-sig)

CSV with BOM:

with open('file.csv', encoding='utf-8-sig') as f:
reader = csv.reader(f)

Revisions (0)

No revisions yet.