patternpythonMinor
Group a list of `dict` by all keys except one
Viewed 0 times
dictgroupallkeysonelistexcept
Problem
I have a function that I use to convert a list of dictionaries into a list of tuples, similar to
I couldn't find a thing on google, so I wrote it pretty quickly, and I am wondering if there is a better way to do this (or a standard library even)
Simple little python function:
Example Input/Output
```
from pprint import pprint
data = [
{'id': 1, 'status': 1, 'product': 1},
{'id': 2, 'status': 1, 'product': 1},
{'id': 7, 'status': 1, 'product': 2},
{'id': 9, 'status': 1, 'product': 2},
{'id': 3, 'status': 1, 'product': 1},
{'id': 4, 'status': 1, 'product': 1},
{'id': 8, 'status': 1, 'p
itertools.groupby() would do in an ideal world. The goal is to make a list of {unique-dict} => [list of values]. I'm having a hard time explaining it, so I hope the example makes it clear. I couldn't find a thing on google, so I wrote it pretty quickly, and I am wondering if there is a better way to do this (or a standard library even)
- Naming things is hard... what is a better one?
- This is a task I performed fairly often when receiving user input
- The keys in each dict may not be consistent (some may be missing)
- It is OK to destroy or mangle the original input list & items
- Validation is done before calling, so 'key_field' is always present
- It is expected that values can be duplicated in the list.
{ status: 1 } => [1,2,3,4,1,2]is ok, if the values are actually present twice.
Simple little python function:
def group_by_excluding_key(list_of_dicts, key_field):
"""
Takes a list of `dict` items and groups by ALL KEYS in the dict EXCEPT the key_field.
:param list_of_dicts: List of dicts to group
:param key_field: key field in dict which should be excluded from the grouping
"""
output = []
for item in list_of_dicts:
found = False
item_key = item.pop(key_field)
for existing_group, found_keys in output:
if existing_group.viewitems() == item.viewitems():
found_keys.append(item_key)
found = True
break
if not found:
output.append((item, [item_key]))
return outputExample Input/Output
```
from pprint import pprint
data = [
{'id': 1, 'status': 1, 'product': 1},
{'id': 2, 'status': 1, 'product': 1},
{'id': 7, 'status': 1, 'product': 2},
{'id': 9, 'status': 1, 'product': 2},
{'id': 3, 'status': 1, 'product': 1},
{'id': 4, 'status': 1, 'product': 1},
{'id': 8, 'status': 1, 'p
Solution
Your code is pretty good, there is one thing that I would add,
the
Which honestly is just noise.
This is as if a for loop runs completely without breaking then the
This can leave you with:
Other than that your code is good.
But if I were to were to write this, I'd prefer a very small solution.
Lets say
This obviously has two down-sides, it's not ordered, and dicts aren't hash able.
Both easily solved with
And so can result in:
This has the benefit of moving the
I know you didn't ask for a performance review, but the performance difference between my code and your code can be tested with the following. The comments are my functions run time over yours as a percentage, so if mine took 0.8s and yours 3.3s then it'll be 24%, followed by how long it took my function to run.
the
for-else keyword, as this gets rid of the found variable.Which honestly is just noise.
This is as if a for loop runs completely without breaking then the
else will run too. But if it breaks then it won't run the else.This can leave you with:
def group_by_excluding_key(list_of_dicts, key_field):
"""
Takes a list of `dict` items and groups by ALL KEYS in the dict EXCEPT the key_field.
:param list_of_dicts: List of dicts to group
:param key_field: key field in dict which should be excluded from the grouping
"""
output = []
for item in list_of_dicts:
item_key = item.pop(key_field)
for existing_group, found_keys in output:
if existing_group.viewitems() == item.viewitems():
found_keys.append(item_key)
break
else:
output.append((item, [item_key]))
return outputOther than that your code is good.
But if I were to were to write this, I'd prefer a very small solution.
Lets say
dicts are hash able, what you want is a dictionary that has the modified item as the key, and the popped item_key as the value.This obviously has two down-sides, it's not ordered, and dicts aren't hash able.
Both easily solved with
collections.OrderedDict and tuple(dict.items()).And so can result in:
from collections import OrderedDict
def group_by_excluding_key(list_of_dicts, key_field):
"""
Takes a list of `dict` items and groups by ALL KEYS in the dict EXCEPT the key_field.
:param list_of_dicts: List of dicts to group
:param key_field: key field in dict which should be excluded from the grouping
"""
output = OrderedDict()
for item in list_of_dicts:
key = item.pop(key_field)
output.setdefault(tuple(item.items()), []).append(key)
return [(dict(key), value) for key, value in output.items()]This has the benefit of moving the
for loop into the OrderedDict, and possibly getting \$O(1)\$ key lookup, but requires you to change the type of all the keys, twice.I know you didn't ask for a performance review, but the performance difference between my code and your code can be tested with the following. The comments are my functions run time over yours as a percentage, so if mine took 0.8s and yours 3.3s then it'll be 24%, followed by how long it took my function to run.
from timeit import timeit
from itertools import count
# 240%, 0.1s
c = count(1)
l = [{'status': 1, 'product': i, 'id': next(c)} for i in range(10)]
print(timeit('fn({!r}, "id")'.format(l), 'from __main__ import group_by_excluding_key_dict as fn', number=1000))
print(timeit('fn({!r}, "id")'.format(l), 'from __main__ import group_by_excluding_key as fn', number=1000))
# 25%, 0.8s
c = count(1)
l = [{'status': 1, 'product': i, 'id': next(c)} for i in range(100)]
print(timeit('fn({!r}, "id")'.format(l), 'from __main__ import group_by_excluding_key_dict as fn', number=1000))
print(timeit('fn({!r}, "id")'.format(l), 'from __main__ import group_by_excluding_key as fn', number=1000))
# 28%, 0.8s
c = count(1)
l = [{'status': i, 'product': j, 'id': next(c)} for i in range(10) for j in range(10)]
print(timeit('fn({!r}, "id")'.format(l), 'from __main__ import group_by_excluding_key_dict as fn', number=1000))
print(timeit('fn({!r}, "id")'.format(l), 'from __main__ import group_by_excluding_key as fn', number=1000))
# 0.4%, 0.9s
c = count(1)
l = [{'status': i, 'product': j, 'id': next(c)} for i in range(100) for j in range(100)]
print(timeit('fn({!r}, "id")'.format(l), 'from __main__ import group_by_excluding_key_dict as fn', number=10))
print(timeit('fn({!r}, "id")'.format(l), 'from __main__ import group_by_excluding_key as fn', number=10))Code Snippets
def group_by_excluding_key(list_of_dicts, key_field):
"""
Takes a list of `dict` items and groups by ALL KEYS in the dict EXCEPT the key_field.
:param list_of_dicts: List of dicts to group
:param key_field: key field in dict which should be excluded from the grouping
"""
output = []
for item in list_of_dicts:
item_key = item.pop(key_field)
for existing_group, found_keys in output:
if existing_group.viewitems() == item.viewitems():
found_keys.append(item_key)
break
else:
output.append((item, [item_key]))
return outputfrom collections import OrderedDict
def group_by_excluding_key(list_of_dicts, key_field):
"""
Takes a list of `dict` items and groups by ALL KEYS in the dict EXCEPT the key_field.
:param list_of_dicts: List of dicts to group
:param key_field: key field in dict which should be excluded from the grouping
"""
output = OrderedDict()
for item in list_of_dicts:
key = item.pop(key_field)
output.setdefault(tuple(item.items()), []).append(key)
return [(dict(key), value) for key, value in output.items()]from timeit import timeit
from itertools import count
# 240%, 0.1s
c = count(1)
l = [{'status': 1, 'product': i, 'id': next(c)} for i in range(10)]
print(timeit('fn({!r}, "id")'.format(l), 'from __main__ import group_by_excluding_key_dict as fn', number=1000))
print(timeit('fn({!r}, "id")'.format(l), 'from __main__ import group_by_excluding_key as fn', number=1000))
# 25%, 0.8s
c = count(1)
l = [{'status': 1, 'product': i, 'id': next(c)} for i in range(100)]
print(timeit('fn({!r}, "id")'.format(l), 'from __main__ import group_by_excluding_key_dict as fn', number=1000))
print(timeit('fn({!r}, "id")'.format(l), 'from __main__ import group_by_excluding_key as fn', number=1000))
# 28%, 0.8s
c = count(1)
l = [{'status': i, 'product': j, 'id': next(c)} for i in range(10) for j in range(10)]
print(timeit('fn({!r}, "id")'.format(l), 'from __main__ import group_by_excluding_key_dict as fn', number=1000))
print(timeit('fn({!r}, "id")'.format(l), 'from __main__ import group_by_excluding_key as fn', number=1000))
# 0.4%, 0.9s
c = count(1)
l = [{'status': i, 'product': j, 'id': next(c)} for i in range(100) for j in range(100)]
print(timeit('fn({!r}, "id")'.format(l), 'from __main__ import group_by_excluding_key_dict as fn', number=10))
print(timeit('fn({!r}, "id")'.format(l), 'from __main__ import group_by_excluding_key as fn', number=10))Context
StackExchange Code Review Q#142134, answer score: 4
Revisions (0)
No revisions yet.