HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonModerate

Python caching generator

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
cachingpythongenerator

Problem

Is this the correct way to cache a generator in Python, without iterating over it right away (for example using list() over it)?

@property
def results(self):
    def caching_iterator(it, r):
        for t in it:
            yield t
            r.append(t)

    if self._results:  # _results is [] initially
        return self._results
    return caching_iterator(self.run(), self._results)


This is a property of a class, as you can see. Basically, the idea is that the first time the property is accessed, a generator is returned. So the results are actually computed as the need arises. All the other subsequent times, the results are already inside a list, so they are returned straight away.

Are there simpler ways to achieve the same result? I like mine quite a bit, but I have one argument against it. The second argument of caching_iterator is a mutable type, a list. Since that object gets modified inside the function, this is potentially dangerous. However, I don't see how that could happen. The first time _results is None and caching_iterator does its job. But all the other times, the if test passes and _results is directly returned.

Solution

Memoizing a generator is a good idea to save having to rerun it. There are a few ways to do it - one is to handle it the way you're handling it, by returning the cached result if it exists, otherwise returning and caching the new result.

The first way, is to the the tee() method from itertools. Such as follows:

from itertools import tee

def run():
    for i in range(1000):
        yield i

sequence, memoized_sequence = tee(run())


This returns a 32 byte object:



which you can then iterate over as you normally would:

for x in memoized_sequence:
    print x


Another way to do it, if the generator consistently returns small sets of data, is to just use list(). This is especially true if, like aforementioned, the generator returns small sets of data and that data is not dependent on an action (ie, data is static) and that data is not kept in memory for a long time (such as in a child that is killed (and its memory freed) once its served its purpose). The way to do so, is:

def run():
    for i in range(1000):
        yield i

memoized_sequence = list(run())


However, this set of 1000 integers while have a size of 4.2KB in memory. So its not ideal, nor memory efficient, but its an option.

Now, the way that you did it, as follows:

_results = []

def run():
    for i in range(1000):
        yield i

def results():
    def caching_iterator(it, r):
        for t in it:
            yield t
            r.append(t)

    if _results:  # _results is [] initially
        return _results
    return caching_iterator(run(), _results)


Would result in a 36 byte object:



I would suggest using the system provided tee function over the custom memoization technique, so your code would look like

from itertools import tee

def results():
    _, memoized_sequence = tee(self.run())
    return memoized_sequence


Doing so shaves you 4 bytes from the returned iterator object, as well as all the variable assignments and the added function and compacts your code.

EDIT

Seems that tee() does not memoize as I thought - it just returns an iterator. However, list() does so, though statically:

import random
from itertools import tee

def run():
    for i in range(5):
        x = random.randint(0, 100) * i
        print "calculating"
        yield x

_, memoized = tee(run())

for x in memoized:
    print x

l = list(run())
print l
print l


Results in:

calculating
0
calculating
92
calculating
94
calculating
165
calculating
100
calculating
calculating
calculating
calculating
calculating
[0, 43, 106, 222, 8]
[0, 43, 106, 222, 8]

Code Snippets

from itertools import tee

def run():
    for i in range(1000):
        yield i

sequence, memoized_sequence = tee(run())
<itertools.tee object at 0x23a564>
<itertools.tee object at 0x23a58c>
for x in memoized_sequence:
    print x
def run():
    for i in range(1000):
        yield i

memoized_sequence = list(run())
_results = []

def run():
    for i in range(1000):
        yield i

def results():
    def caching_iterator(it, r):
        for t in it:
            yield t
            r.append(t)

    if _results:  # _results is [] initially
        return _results
    return caching_iterator(run(), _results)

Context

StackExchange Code Review Q#60056, answer score: 10

Revisions (0)

No revisions yet.