snippetpythonModeratepending
Python generators and itertools — memory-efficient data processing
Viewed 0 times
generatoryielditertoolslazy evaluationstreamingchunked
python
Problem
Processing large datasets by loading everything into memory causes OOM errors. Need to process items one at a time in a streaming fashion.
Solution
Use generators (yield) for lazy evaluation and itertools for efficient iteration patterns. Generators produce values on demand without materializing the full sequence.
Code Snippets
Generators and itertools for streaming data
import itertools
from typing import Iterator, Iterable, TypeVar
T = TypeVar('T')
def chunked(iterable: Iterable[T], size: int) -> Iterator[list[T]]:
"""Yield successive chunks from iterable."""
it = iter(iterable)
while chunk := list(itertools.islice(it, size)):
yield chunk
def read_large_file(path: str) -> Iterator[dict]:
"""Stream JSON lines without loading entire file."""
import json
with open(path) as f:
for line in f:
if line.strip():
yield json.loads(line)
# Process 1M records in chunks of 1000
for batch in chunked(read_large_file('data.jsonl'), 1000):
process_batch(batch)
# itertools recipes
first_10 = itertools.islice(huge_generator(), 10)
unique = dict.fromkeys(items) # preserves order in 3.7+
flattened = itertools.chain.from_iterable(nested_lists)Revisions (0)
No revisions yet.