gotchapythonModeratepending
Gotcha: Python string encoding and bytes confusion
Viewed 0 times
unicodeencodingbytesutf-8decodeencode
Error Messages
Problem
Python 3 string vs bytes confusion causes TypeError, UnicodeDecodeError, and garbled text in file I/O, network, and API interactions.
Solution
Python string encoding essentials:
# FUNDAMENTAL: str is Unicode text, bytes is raw data
text = 'Hello' # str (Unicode)
data = b'Hello' # bytes (raw)
# CONVERSION:
data = text.encode('utf-8') # str -> bytes
text = data.decode('utf-8') # bytes -> str
# COMMON ERRORS:
# Error 1: Mixing str and bytes
# 'Hello' + b' World' # TypeError!
# Error 2: Wrong encoding assumption
# data.decode('ascii') # UnicodeDecodeError if data has non-ASCII
# Error 3: Double encoding
text = 'cafe\u0301' # 'cafe' (accent on e)
data = text.encode('utf-8') # b'caf\xc3\xa9'
# BAD: encoding already-encoded bytes
# data.encode('utf-8') # AttributeError (bytes has no encode)
# FILE I/O:
# Text mode (default) - handles encoding
with open('file.txt', 'r', encoding='utf-8') as f:
text = f.read() # Returns str
# Binary mode - raw bytes
with open('image.png', 'rb') as f:
data = f.read() # Returns bytes
# HTTP RESPONSES:
import requests
resp = requests.get('https://api.example.com/data')
resp.text # str (decoded using detected encoding)
resp.content # bytes (raw response body)
resp.json() # Parsed JSON (decoded automatically)
# HANDLING UNKNOWN ENCODING:
try:
text = data.decode('utf-8')
except UnicodeDecodeError:
text = data.decode('latin-1') # Never fails (1:1 byte mapping)
# Or: text = data.decode('utf-8', errors='replace') # Uses U+FFFD
# Or: text = data.decode('utf-8', errors='ignore') # Drops bad bytes
# SUBPROCESS:
import subprocess
result = subprocess.run(['ls'], capture_output=True, text=True)
result.stdout # str (with text=True)
# Without text=True, stdout is bytesWhy
Python 3 enforces the distinction between text (str) and binary data (bytes). This prevents the silent data corruption that plagued Python 2, but requires explicit encoding/decoding.
Context
Python text processing and I/O
Revisions (0)
No revisions yet.