HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Persisted Dictionary Implementation

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
persisteddictionaryimplementation

Problem

I'm using this code in an app that has hundreds of daily users. The code works, and passes all of the test cases I have.

Occasionally some users report a problem that I don't get right to the bottom of, and there is a possibility it is related to this PermaDict. Can you see anything I may have missed in terms of robustness, or corner cases that may cause it to fail? (And of course, any other Code Review feedback is welcome).

```
import pickle, json, csv, os, shutil

class PermaDict(dict):
''' Persistent dictionary with an API compatible with shelve and anydbm.

The dict is kept in memory, so the dictionary operations run as fast as
a regular dictionary.

Write to disk is delayed until close or sync (similar to gdbm's fast mode).

Input file format is automatically discovered.
Output file format is selectable between pickle, json, and csv.
All three serialization formats are backed by fast C implementations.

'''

def __init__(self, filename, flag='c', mode=None, format='json', *args, **kwds):
self.flag = flag # r=readonly, c=create, or n=new
self.mode = mode # None or an octal triple like 0644
self.format = format # 'csv', 'json', or 'pickle'
self.filename = filename
if flag != 'n' and os.access(filename, os.R_OK):
fileobj = open(filename, 'rb') # 'rb' if format=='pickle' else 'r')
with fileobj:
self.load(fileobj)
dict.__init__(self, *args, **kwds)

def sync(self):
'Write dict to disk'
if self.flag == 'r':
return
filename = self.filename
tempname = filename + '.tmp'
try:
with open(tempname, 'wb') as fileobj: # if self.format=='pickle' else 'w') as fileobj:
self.dump(fileobj)
except Exception:
os.remove(tempname)
raise
shutil.move(tempname, self.filename) # atomic

Solution

This is overall look quite good for me. I personally prefer """ for comments.

Dependency Injection

Problem I see with PermaDict is that it needs to know about json/pickle/csv load and dump methods. This makes it harder for you to add any new serialisation without changing the code.

In Python functions are objects, so you can pass them as parameters to construction. You can even pass whole modules/classes, etc too.

def __init__(self, filename, flag='c', mode=None, format='json', *args, **kwds):


Instead of specifying 'json' as a string. We can simply pass json or any other object with .load and .dump methods.

def __init__(self, filename, flag='c', mode=None, serializer=json, *args, **kwds):


Now we just use self.serializer.dump and self.serializer.load. This way you can easily support multiple serializers.

for loader in (pickle.load, json.load, csv.reader):


I don't personally like this kind of try all loading. I'd avoid pickle and stick with json if it is possible due to security concerns.


Warning The pickle module is not secure against erroneous or
maliciously constructed data. Never unpickle data received from an
untrusted or unauthenticated source.

However since it is for your own pickled content it might be fine.

Designing for resilience

I suggest you give LMDB a try. You can find more information here: https://symas.com/lmdb/. There are also python bindings available. This should be easier than trying to come up with your own resilient data store.

To be resilient is hard work. To have some resilience maybe store 2 copies of same data, store data checksums or create backups that you can recover from. All hardwares, softwares, OSes, Python interpreters and C runtimes. are subject to failure.

Code Snippets

def __init__(self, filename, flag='c', mode=None, format='json', *args, **kwds):
def __init__(self, filename, flag='c', mode=None, serializer=json, *args, **kwds):
for loader in (pickle.load, json.load, csv.reader):

Context

StackExchange Code Review Q#54291, answer score: 3

Revisions (0)

No revisions yet.