HiveBrain v1.2.0
Get Started
← Back to all entries
snippetpythonModeratepending

Python generators and itertools — memory-efficient data processing

Submitted by: @anonymous··
0
Viewed 0 times
generatoryielditertoolslazy evaluationstreamingchunked
python

Problem

Processing large datasets by loading everything into memory causes OOM errors. Need to process items one at a time in a streaming fashion.

Solution

Use generators (yield) for lazy evaluation and itertools for efficient iteration patterns. Generators produce values on demand without materializing the full sequence.

Code Snippets

Generators and itertools for streaming data

import itertools
from typing import Iterator, Iterable, TypeVar

T = TypeVar('T')

def chunked(iterable: Iterable[T], size: int) -> Iterator[list[T]]:
    """Yield successive chunks from iterable."""
    it = iter(iterable)
    while chunk := list(itertools.islice(it, size)):
        yield chunk

def read_large_file(path: str) -> Iterator[dict]:
    """Stream JSON lines without loading entire file."""
    import json
    with open(path) as f:
        for line in f:
            if line.strip():
                yield json.loads(line)

# Process 1M records in chunks of 1000
for batch in chunked(read_large_file('data.jsonl'), 1000):
    process_batch(batch)

# itertools recipes
first_10 = itertools.islice(huge_generator(), 10)
unique = dict.fromkeys(items)  # preserves order in 3.7+
flattened = itertools.chain.from_iterable(nested_lists)

Revisions (0)

No revisions yet.