HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Python list comparison: Can this code run faster?

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
thiscancomparisonfasterpythoncodelistrun

Problem

The idea is to apply some function on each element in each list, then compare two lists by the value returned by the function.

My current solution works but is not fast enough. running "python -m cProfile" gives sth. like:

ncalls    tottime  percall  cumtime  percall filename:lineno(function)
  2412505   13.335   0.000    23.633   0.000   common.py:38()
  285000    5.434    0.000    29.067   0.000   common.py:37(to_dict)
  142500    3.392    0.000    35.948   0.000   common.py:3(compare_lists)


Here is my code, I would like to know how to optimize it to run faster.

```
import itertools

def compare_lists(li1, li2, value_func1=None, value_func2=None):
""" Compare li1 and li2, return the results as a list in the following form:

[[data seen in both lists], [data only seen in li1], [data only seen in li2]]

and [data seen in both lists] contains 2 tuple: [(actual items in li1), (actual items in li2)]

value_func1* callback function to li1, applied to each item in the list, returning the logical value for comparison
value_func2* callback function to li2, similarly

If not supplied, lists will be compared as it is.

Usage::

>>> compare_lists([1, 2, 3], [1, 3, 5])
>>> ([(1, 3), (1, 3)], [2], [5])

Or with callback functions specified::

>>> f = lambda x: x['v']
>>>
>>> li1 = [{'v': 1}, {'v': 2}, {'v': 3}]
>>> li2 = [1, 3, 5]
>>>
>>> compare_lists(li1, li2, value_func1=f)
>>> ([({'v': 1}, {'v': 3}), (1, 3)], [{'v': 2}], [5])

"""

if not value_func1:
value_func1 = (lambda x:x)
if not value_func2:
value_func2 = (lambda x:x)

def to_dict(li, vfunc):
return dict((k, list(g)) for k, g in itertools.groupby(li, vfunc))

def flatten(li):
return reduce(list.__add__, li) if li else []

d1 = to_dict(li1, value_func1)
d2 = to_dict(li2, value_func2)

if d1 == d2:
return (li1, li2), [],

Solution

So it seems you primarily want to use your callback functions to pull the value to compare out of an object, if you don't mind simplifying how compared items are returned, a faster and simpler approach would be:

def simple_compare(li1, li2, value_func1=None, value_func2=None):
        s1, s2 = set(map(value_func1, li1)), set(map(value_func2, li2))
        common = s1.intersection(s2)
        s1_diff, s2_diff = s1.difference(s2), s2.difference(s1)
        return common, s1_diff, s2_diff

>>> simple_compare(li1, li2, value_func1=f)
>> compare_lists(li1, li2, value_func1=f)
<<< (([{'v': 1}, {'v': 3}], [1, 3]), [{'v': 2}], [5])


Which depending on actual use case, might be something you could live with. It's definitely a lot faster:

>>> timeit x = simple_compare(xrange(10000), xrange(10000))
100 loops, best of 3: 2.3 ms per loop

>>> timeit x = compare_lists(xrange(10000), xrange(10000))
10 loops, best of 3: 53.1 ms per loop

Code Snippets

def simple_compare(li1, li2, value_func1=None, value_func2=None):
        s1, s2 = set(map(value_func1, li1)), set(map(value_func2, li2))
        common = s1.intersection(s2)
        s1_diff, s2_diff = s1.difference(s2), s2.difference(s1)
        return common, s1_diff, s2_diff

>>> simple_compare(li1, li2, value_func1=f)
<<< (set([1, 3]), set([2]), set([5]))

>>> compare_lists(li1, li2, value_func1=f)
<<< (([{'v': 1}, {'v': 3}], [1, 3]), [{'v': 2}], [5])
>>> timeit x = simple_compare(xrange(10000), xrange(10000))
100 loops, best of 3: 2.3 ms per loop

>>> timeit x = compare_lists(xrange(10000), xrange(10000))
10 loops, best of 3: 53.1 ms per loop

Context

StackExchange Code Review Q#4757, answer score: 3

Revisions (0)

No revisions yet.