HiveBrain v1.2.0
Get Started
← Back to all entries
principlepythonMinor

Compare dicts and record changes

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
dictsrecordchangesandcompare

Problem

My goal is to continually update an object and record any changes and the times they were made at. The input has to be JSON but the output/storage format doesn't.

This function will check a JSON object/dictionary against a reference object and return a list containing the updated object and another object containing just the updates.

I've tested it a bit and I think it handles inconsistencies in the objects, such as the old object not having the correct nesting.

It only compares new to old, so any missing keys in the new version won't be removed from the old version. It's not memory-impervious but there's no recursion so depth itself isn't a problem.

How can I make my approach more efficient? (Faster, less steps involved)

```
def getFromDict(dataDict, mapList):
return reduce(lambda d, k: d[k], mapList, dataDict)

def setInDict(dataDict, mapList, value):
getFromDict(dataDict, mapList[:-1])[mapList[-1]] = value

def record_changes(obj, keys, key, value):
for _ in xrange(len(keys)+10):
try:
setInDict(obj, keys+[key], value)
return obj
except:
for i in range(len(keys)+1)[1:]:
try:
assert isinstance(getFromDict(obj, keys[:i]), dict)
except:
setInDict(obj, keys[:i], {})
else:
raise Exception

def compare_object(N, record):
"""
:param:N new object
:param:record old object
"""
changes = {}
D = N.copy()
nested_keys = []
for key in D:
if isinstance(D[key], dict):
nested_keys.append([key])
else:
if D[key] != record.get(key, None):
record[key] = D[key]
changes[key] = D[key]
for keys in nested_keys:
d = getFromDict(N, keys)
for key in d:
if isinstance(d[key], dict):
nested_keys.append(keys+[key])
else:
try:
if d[key] != getFromDict

Solution

So you want a variation of dict.update that recursively update inner dicts; and you want it to record changes as you go.

I’m afraid that your solution is way too complicated for that:

  • compare_object has two for loops (for key in D and for key in d) that mainly perform the same kind of task. The former being an initialization to the outer loop of the latter.



  • compare_object copies its N parameter which is never modified and modify record in place (thus there is no need in returning it); which is at best misleading, at worse a bug.



  • record_change seems to be only an extremely verbose and convoluted way of calling setInDict while ensuring that each level of nesting will actually contain a dictionary.



As for the actual coding style:

  • Always specify the kind of exception you’re expecting in an except clause; as in you probably don't expect to catch IndexError when calling record_changes.



  • Raising Exception is also worth noting. You should either declare your own exception or raise a more specific one.



  • Iterating over keys in a dictionary and repeatedly calling d[key] reads poorly: it's better to iterate over the pairs of keys and values returned by items or iteritems (Python 2): for key, value in d.iteritems():.



  • As already stated around here AssertionErrors are not meant to control the flow of execution. If I run your code in optimized mode (-O switch on the command line), your code breaks.



Now before getting back to the algorithm, let me introduce you to dict.setdefault: it works pretty much the same than dict.get but instead of just returning the default value in case the key does not exist, it set the key to that default value if it doesn't exist before returning the value.

With that information in mind, let's take an empty file and build a function that update a dictionary and recursively update inner dictionaries.

-
The name should convey (at least part of) this:

def update_inner(orig, update):


-
The function should return values that actually changed:

change = {}
    ...
    return change


-
The function should update values of the original dict with values from the update dict (and still record changes):

change = {}
    for key, value in update.iteritems():
        if value != orig.get(key):
            orig[key] = change[key] = value
    return change


-
The function should recurse to update inner dictionaries:

def update_inner(orig, update):
    change = {}
    for key, value in update.iteritems():
        try:
            # Consider value is a dict, so recurse
            changed = update_inner(orig, value)
        except AttributeError:
            # We were wrong, value is not a dict, backup to simple update
            if orig.get(key) != value:
                orig[key] = change[key] = value
        else:
            # We were right, record the changes
            change[key] = changed
    return change


Wait no, there is something wrong… We are not keeping track of the keys used to go deeper into orig when doing the recursion… Find out why I talked about setdefault now?

Proposed improvements

def update_inner(orig, update):
    change = {}
    for key, value in update.iteritems():
        try:
            # Consider value is a dict, so recurse
            changed = update_inner(orig.setdefault(key, {}), value)
        except AttributeError:
            # We were wrong, value is not a dict, backup to simple update
            if orig[key] != value: # Direct access is OK thanks to setdefault above
                orig[key] = change[key] = value
        else:
            # We were right, record the changes
            change[key] = changed
    return change


You use it like so:

>>> record = {'a':{'b':2, 'q':0}, 'f':9}
>>> new = {'a':{'b':1, 'q':0}, 'c':5}
>>> update_inner(record, new)
{'a': {'b': 1}, 'c': 5}
>>> record
{'f': 9, 'a': {'q': 0, 'b': 1}, 'c': 5}

Code Snippets

def update_inner(orig, update):
change = {}
    ...
    return change
change = {}
    for key, value in update.iteritems():
        if value != orig.get(key):
            orig[key] = change[key] = value
    return change
def update_inner(orig, update):
    change = {}
    for key, value in update.iteritems():
        try:
            # Consider value is a dict, so recurse
            changed = update_inner(orig, value)
        except AttributeError:
            # We were wrong, value is not a dict, backup to simple update
            if orig.get(key) != value:
                orig[key] = change[key] = value
        else:
            # We were right, record the changes
            change[key] = changed
    return change
def update_inner(orig, update):
    change = {}
    for key, value in update.iteritems():
        try:
            # Consider value is a dict, so recurse
            changed = update_inner(orig.setdefault(key, {}), value)
        except AttributeError:
            # We were wrong, value is not a dict, backup to simple update
            if orig[key] != value: # Direct access is OK thanks to setdefault above
                orig[key] = change[key] = value
        else:
            # We were right, record the changes
            change[key] = changed
    return change

Context

StackExchange Code Review Q#113462, answer score: 2

Revisions (0)

No revisions yet.