HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

A python default dictionary which seamlessly saves to disk

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
diskseamlesslydefaultpythonwhichdictionarysaves

Problem

I sometimes do experiments at work and separate the computation and the analysis so I can do the computation on a cluster and the analysis locally and sometimes in a Jupyter notebook. I wrote a class which allows me to save results to a hidden file as if it was a dictionary. The idea is to create an object specifying the name of the experiment and from there you can use it as a dictionary, and it is saved to disk so you can access it from other python files. I'd appreciate any thoughts since IO isn't my forte. I used python 2.7 but I think it should work for python 3.0

import os
import cPickle as pickle

class FileDict():

    def __init__(self, name, default = None):
        self.fpath = '.{}.fd'.format(name)
        self.default = default

    def __getitem__(self, key):
        if os.path.isfile(self.fpath):
            d = pickle.load(open(self.fpath))
            if key in d:
                return d[key]
        else:
            return self.default

    def __setitem__(self, key, value):
        if os.path.isfile(self.fpath):
            d = pickle.load(open(self.fpath))
            d[key] = value
        else:
            d = {key : value}
        pickle.dump(d, open(self.fpath, 'w'))

if __name__ == '__main__':
    test = FileDict('test', 0)
    print(test[1])
    test[1] = 'thing'
    print(test[1])
    print(test[2])

Solution


  • I think you should rather returns a default value if file exists but key is not found in it.



  • Opening file every time you want to get or set a value is expensive. Consider reading it in __init__ method, saving the data in handler registered using atexit module and adding a flush() method if for some reason you'd like to dump data right now. You could also add some no_cache option in __init__ to force saving thins right away?



  • pickle module is potentially insecure. Consider famous example import pickle; pickle.loads("cantigravity\n") If you open a maliciously prepared file with your class a weird things can happen. It should be used for internal used only, not for a general purpose class that can read any input.



  • What if multiple instances of FileDict will use the same file as source? Consider using tempfile module or generating unique names with uuid module, and then allowing returning generated the name with some method



  • Following collections.defaultdict example you might want to use a callable to generate a default value. You can always pass lambda: 0 if you want to have only one value returned

Context

StackExchange Code Review Q#157809, answer score: 3

Revisions (0)

No revisions yet.