patternpythonMinor
Python list comparison: Can this code run faster?
Viewed 0 times
thiscancomparisonfasterpythoncodelistrun
Problem
The idea is to apply some function on each element in each list, then compare two lists by the value returned by the function.
My current solution works but is not fast enough. running "python -m cProfile" gives sth. like:
Here is my code, I would like to know how to optimize it to run faster.
```
import itertools
def compare_lists(li1, li2, value_func1=None, value_func2=None):
""" Compare li1 and li2, return the results as a list in the following form:
[[data seen in both lists], [data only seen in li1], [data only seen in li2]]
and [data seen in both lists] contains 2 tuple: [(actual items in li1), (actual items in li2)]
value_func1* callback function to li1, applied to each item in the list, returning the logical value for comparison
value_func2* callback function to li2, similarly
If not supplied, lists will be compared as it is.
Usage::
>>> compare_lists([1, 2, 3], [1, 3, 5])
>>> ([(1, 3), (1, 3)], [2], [5])
Or with callback functions specified::
>>> f = lambda x: x['v']
>>>
>>> li1 = [{'v': 1}, {'v': 2}, {'v': 3}]
>>> li2 = [1, 3, 5]
>>>
>>> compare_lists(li1, li2, value_func1=f)
>>> ([({'v': 1}, {'v': 3}), (1, 3)], [{'v': 2}], [5])
"""
if not value_func1:
value_func1 = (lambda x:x)
if not value_func2:
value_func2 = (lambda x:x)
def to_dict(li, vfunc):
return dict((k, list(g)) for k, g in itertools.groupby(li, vfunc))
def flatten(li):
return reduce(list.__add__, li) if li else []
d1 = to_dict(li1, value_func1)
d2 = to_dict(li2, value_func2)
if d1 == d2:
return (li1, li2), [],
My current solution works but is not fast enough. running "python -m cProfile" gives sth. like:
ncalls tottime percall cumtime percall filename:lineno(function)
2412505 13.335 0.000 23.633 0.000 common.py:38()
285000 5.434 0.000 29.067 0.000 common.py:37(to_dict)
142500 3.392 0.000 35.948 0.000 common.py:3(compare_lists)Here is my code, I would like to know how to optimize it to run faster.
```
import itertools
def compare_lists(li1, li2, value_func1=None, value_func2=None):
""" Compare li1 and li2, return the results as a list in the following form:
[[data seen in both lists], [data only seen in li1], [data only seen in li2]]
and [data seen in both lists] contains 2 tuple: [(actual items in li1), (actual items in li2)]
value_func1* callback function to li1, applied to each item in the list, returning the logical value for comparison
value_func2* callback function to li2, similarly
If not supplied, lists will be compared as it is.
Usage::
>>> compare_lists([1, 2, 3], [1, 3, 5])
>>> ([(1, 3), (1, 3)], [2], [5])
Or with callback functions specified::
>>> f = lambda x: x['v']
>>>
>>> li1 = [{'v': 1}, {'v': 2}, {'v': 3}]
>>> li2 = [1, 3, 5]
>>>
>>> compare_lists(li1, li2, value_func1=f)
>>> ([({'v': 1}, {'v': 3}), (1, 3)], [{'v': 2}], [5])
"""
if not value_func1:
value_func1 = (lambda x:x)
if not value_func2:
value_func2 = (lambda x:x)
def to_dict(li, vfunc):
return dict((k, list(g)) for k, g in itertools.groupby(li, vfunc))
def flatten(li):
return reduce(list.__add__, li) if li else []
d1 = to_dict(li1, value_func1)
d2 = to_dict(li2, value_func2)
if d1 == d2:
return (li1, li2), [],
Solution
So it seems you primarily want to use your callback functions to pull the value to compare out of an object, if you don't mind simplifying how compared items are returned, a faster and simpler approach would be:
Which depending on actual use case, might be something you could live with. It's definitely a lot faster:
def simple_compare(li1, li2, value_func1=None, value_func2=None):
s1, s2 = set(map(value_func1, li1)), set(map(value_func2, li2))
common = s1.intersection(s2)
s1_diff, s2_diff = s1.difference(s2), s2.difference(s1)
return common, s1_diff, s2_diff
>>> simple_compare(li1, li2, value_func1=f)
>> compare_lists(li1, li2, value_func1=f)
<<< (([{'v': 1}, {'v': 3}], [1, 3]), [{'v': 2}], [5])Which depending on actual use case, might be something you could live with. It's definitely a lot faster:
>>> timeit x = simple_compare(xrange(10000), xrange(10000))
100 loops, best of 3: 2.3 ms per loop
>>> timeit x = compare_lists(xrange(10000), xrange(10000))
10 loops, best of 3: 53.1 ms per loopCode Snippets
def simple_compare(li1, li2, value_func1=None, value_func2=None):
s1, s2 = set(map(value_func1, li1)), set(map(value_func2, li2))
common = s1.intersection(s2)
s1_diff, s2_diff = s1.difference(s2), s2.difference(s1)
return common, s1_diff, s2_diff
>>> simple_compare(li1, li2, value_func1=f)
<<< (set([1, 3]), set([2]), set([5]))
>>> compare_lists(li1, li2, value_func1=f)
<<< (([{'v': 1}, {'v': 3}], [1, 3]), [{'v': 2}], [5])>>> timeit x = simple_compare(xrange(10000), xrange(10000))
100 loops, best of 3: 2.3 ms per loop
>>> timeit x = compare_lists(xrange(10000), xrange(10000))
10 loops, best of 3: 53.1 ms per loopContext
StackExchange Code Review Q#4757, answer score: 3
Revisions (0)
No revisions yet.