HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Functions to merge dictionaries with a comparison

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
mergewithdictionariescomparisonfunctions

Problem

I have several functions for merging some dictionaries but over time I created a more general function that would make all these others obsolete if it weren't slower.

I have the specialized (and several like it) functions that look like this:

def merge_keep_lowest(dict1, dict2, *dicts):
    """
    Merge an arbitary number of :py:class:`dict`-like objects and keeps the
    **lowest** encountered value for each key.

    Parameters
    ----------
    *dicts : dict-like
        The dictionaries to be merged. At least two must be given.

    Returns
    -------
    result : any type
        The merged dictionaries.
    """
    # Copy the first dict so the result's class and its properties are defined
    result = dict1.copy()

    # We only want to iterate once so combine the second and the other dicts.
    dicts = (dict2,) + dicts

    # Now iterate over each dictionary since there is no directly useable
    # dict-method for this kind of operation
    for d in dicts:
        # Now iterate over each key of this dict.
        # This way is faster than "for kw in d.keys()".
        for kw in d:
            # One could also use "try ... except KeyError ..." here instead of
            # the "if kw in result". That would be a bit faster if all dicts
            # contained mostly the same keys ... but since contain checks with
            # dictionaries are relativly cheap - so it doesn't make a huge
            # difference
            if kw in result:
                # The key was already present in the result so compare it and
                # replace it if it is smaller.
                if d[kw] < result[kw]:
                    result[kw] = d[kw]
            else:
                # If the key was not yet in the result dict just initialize it
                result[kw] = d[kw]

    return result


There are some more for replacing the key if it is higher/shorter/longer/... all look exactly the same except for the if d[kw] < result[kw]: line.

Then I thoug

Solution

The version taking func is clearly the nicest; the version which always chooses the lowest is only really better if speed is actually needed.

There's no reason to force at least two parameters. Just use *dicts and default to {}. It's simpler and nicer.

_by_func is just _by. *_by_method is horrible and should be avoided - it's not even more general and it's all stringy.

Note that lambda x, y: True if x < y else False is just lambda x, y: x < y is just operator.lt.

Further, you should really have a fold function, not a comparator, so you can do stuff like

def merge_keep_lowest(*dicts):
    return merge_dicts_by(*dicts, fold=min)


but then also

def merge_counts(*dicts):
    return merge_dicts_by(dicts, fold=sum)


A fold would look like

result[kw] = fold(result[kw], d[kw])


and any comparator comp(new, old) can be turned into a fold with

lambda old, new: new if comp(new, old) else old


For example, for a comparator of

lambda new, old: new <= old


one has

lambda old, new: new if new <= old else old


or, simply stated,

min

Code Snippets

def merge_keep_lowest(*dicts):
    return merge_dicts_by(*dicts, fold=min)
def merge_counts(*dicts):
    return merge_dicts_by(dicts, fold=sum)
result[kw] = fold(result[kw], d[kw])
lambda old, new: new if comp(new, old) else old
lambda new, old: new <= old

Context

StackExchange Code Review Q#122406, answer score: 4

Revisions (0)

No revisions yet.