HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Merging two lists of dicts based on the value of a specific key

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
thedictsmergingvaluetwobasedlistsspecifickey

Problem

I'm combining two lists of dicts, based on the value of the key 'time'.

If have two lists, for example:

a=[{'time': '25 APR', 'total': 10, 'high': 10}, 
   {'time': '26 APR', 'total': 5, 'high': 5}]

b=[{'time': '24 APR', 'total': 10, 'high': 10}, 
   {'time': '26 APR', 'total': 15, 'high': 5}]


These lists contain values per day, for a specific source, say a and b.

I would like to add the dicts in the lists, based on the value of the key 'time'. So the end result would be:

c=[{'time': '24 APR', 'total': 10, 'high': 10}, 
   {'time': '25 APR', 'total': 10, 'high': 10}, 
   {'time': '26 APR', 'total': 20, 'high': 10}]


Notice that the results for 26 APR are added.

The way I do this now, is as follows:

from collections import Counter
import itertools

lst = sorted(itertools.chain(totals_30_days, past_30_days), key=lambda x: x['time'])

f = []

for k,v in itertools.groupby(lst, key=lambda x:x['time']):
    v = list(v)

    # Check if there are more than one item, because adding an empty Counter()    
    # will delete any keys with zero or negative values. 
    if len(v) > 1:
        e = Counter()
        for i in v:
            c = Counter(i)
            time = c.pop('time', None)
            e = e + c
        e['time'] = time
        f.append(dict(e))
    else:
        f.append(v[0])
print(f)


The result is correct:

[{'high': 10, 'total': 10, 'time': '24 APR'}, {'high': 10, 'total': 10, 'time': '25 APR'}, {'high': 10, 'total': 20, 'time': '26 APR26 APR'}]


But I wonder if it could be more efficient. Any ideas?

Solution

itertools.groupby() is a good start. To process each group, you want to take advantage of functools.reduce(). reduce() does the right thing, whether there is just one record in a group or multiple records.

Your code suffers from readability problems. One obvious issue is that a, b, c, d, e, and f are horribly meaningless variable names. Another problem is that the only way to see what it does is to trace through the code. (Well, it would help if you wrote the entire introductory text to this question as a giant comment, but ideally the code should be eloquent enough to speak for itself.)

Let's start with the goal of being able to write this:

a=[{'time': '25 APR', 'total': 10, 'high': 10}, 
   {'time': '26 APR', 'total': 5, 'high': 5}]

b=[{'time': '24 APR', 'total': 10, 'high': 10}, 
   {'time': '26 APR', 'total': 15, 'high': 5}]

merger = merge_list_of_records_by('time', add)
print(merger(a + b))


Then, it's a matter of writing a merge_list_of_records_by() function to make that happen.

from functools import reduce
from itertools import groupby
from operator import add, itemgetter

def merge_records_by(key, combine):
    """Returns a function that merges two records rec_a and rec_b.
       The records are assumed to have the same value for rec_a[key]
       and rec_b[key].  For all other keys, the values are combined
       using the specified binary operator.
    """
    return lambda rec_a, rec_b: {
        k: rec_a[k] if k == key else combine(rec_a[k], rec_b[k])
        for k in rec_a
    }

def merge_list_of_records_by(key, combine):
    """Returns a function that merges a list of records, grouped by
       the specified key, with values combined using the specified
       binary operator."""
    keyprop = itemgetter(key)
    return lambda lst: [
        reduce(merge_records_by(key, combine), records)
        for _, records in groupby(sorted(lst, key=keyprop), keyprop)
    ]

Code Snippets

a=[{'time': '25 APR', 'total': 10, 'high': 10}, 
   {'time': '26 APR', 'total': 5, 'high': 5}]

b=[{'time': '24 APR', 'total': 10, 'high': 10}, 
   {'time': '26 APR', 'total': 15, 'high': 5}]

merger = merge_list_of_records_by('time', add)
print(merger(a + b))
from functools import reduce
from itertools import groupby
from operator import add, itemgetter

def merge_records_by(key, combine):
    """Returns a function that merges two records rec_a and rec_b.
       The records are assumed to have the same value for rec_a[key]
       and rec_b[key].  For all other keys, the values are combined
       using the specified binary operator.
    """
    return lambda rec_a, rec_b: {
        k: rec_a[k] if k == key else combine(rec_a[k], rec_b[k])
        for k in rec_a
    }

def merge_list_of_records_by(key, combine):
    """Returns a function that merges a list of records, grouped by
       the specified key, with values combined using the specified
       binary operator."""
    keyprop = itemgetter(key)
    return lambda lst: [
        reduce(merge_records_by(key, combine), records)
        for _, records in groupby(sorted(lst, key=keyprop), keyprop)
    ]

Context

StackExchange Code Review Q#85818, answer score: 8

Revisions (0)

No revisions yet.