snippetpythonModeratependingCanonical
Python generators and itertools -- memory-efficient data processing
Viewed 0 times
generatoryielditertoolslazy evaluationstreamingchunked
python
Problem
Processing large datasets by loading everything into memory causes OOM errors. Need to process items lazily.
Solution
Use generators (yield) for lazy evaluation and itertools for efficient iteration patterns.
Code Snippets
Generators and itertools for streaming data
import itertools
from typing import Iterator, Iterable, TypeVar
T = TypeVar('T')
def chunked(iterable: Iterable[T], size: int) -> Iterator[list[T]]:
it = iter(iterable)
while chunk := list(itertools.islice(it, size)):
yield chunk
def read_large_file(path: str) -> Iterator[dict]:
import json
with open(path) as f:
for line in f:
if line.strip():
yield json.loads(line)
for batch in chunked(read_large_file('data.jsonl'), 1000):
process_batch(batch)Revisions (0)
No revisions yet.