patternpythonMinor
Find and process duplicates in list of lists
Viewed 0 times
processlistsfindandlistduplicates
Problem
I'm trying to merge counts for items (URLs) in the list:
I came up with a function, but it's slow when I run it with 50k entries. I'd appreciate if somebody could please review and suggest improvements.
[['foo',1], ['bar',3],['foo',4]]I came up with a function, but it's slow when I run it with 50k entries. I'd appreciate if somebody could please review and suggest improvements.
def dedupe(data):
''' Finds duplicates in data and merges the counts '''
result = []
for row in data:
url, count = row
url_already_in_result = filter(lambda res_row: res_row[0] == url, result)
if url_already_in_result:
url_already_in_result[0][1] += count
else:
result.append(row)
return result
def test_dedupe():
data = [['foo',1], ['bar',3],['foo',4]]
assert dedupe(data) == [['foo',5], ['bar',3]]Solution
It looks like you could use
collections.Counter. Although you may want to use it earlier in your code, when you create the list of pairs you pass to dedupe. As is, you could use the following in your code:from collections import Counter
def dedupe(data):
result = Counter()
for row in data:
result.update(dict([row]))
return result.items()
>>> data = [['foo',1], ['bar',3],['foo',4]]
>>> dedupe(data)
[('foo', 5), ('bar', 3)]Code Snippets
from collections import Counter
def dedupe(data):
result = Counter()
for row in data:
result.update(dict([row]))
return result.items()
>>> data = [['foo',1], ['bar',3],['foo',4]]
>>> dedupe(data)
[('foo', 5), ('bar', 3)]Context
StackExchange Code Review Q#24458, answer score: 6
Revisions (0)
No revisions yet.