HiveBrain v1.2.0
Get Started
← Back to all entries
snippetpythonModeratependingCanonical

Python generators and itertools -- memory-efficient data processing

Submitted by: @anonymous··
0
Viewed 0 times
generatoryielditertoolslazy evaluationstreamingchunked
python

Problem

Processing large datasets by loading everything into memory causes OOM errors. Need to process items lazily.

Solution

Use generators (yield) for lazy evaluation and itertools for efficient iteration patterns.

Code Snippets

Generators and itertools for streaming data

import itertools
from typing import Iterator, Iterable, TypeVar

T = TypeVar('T')

def chunked(iterable: Iterable[T], size: int) -> Iterator[list[T]]:
    it = iter(iterable)
    while chunk := list(itertools.islice(it, size)):
        yield chunk

def read_large_file(path: str) -> Iterator[dict]:
    import json
    with open(path) as f:
        for line in f:
            if line.strip():
                yield json.loads(line)

for batch in chunked(read_large_file('data.jsonl'), 1000):
    process_batch(batch)

Revisions (0)

No revisions yet.