patternpythonMinor
Finding the mode of an array/iterable
Viewed 0 times
thearraymodeiterablefinding
Problem
I'm currently using SciPy's mode function to find the most occurring item in different iterable objects. I like the mode function because it works on every object type I've thrown at it (strings,
While this function seems reliable, it is very slow on bigger lists:
Is there a better way to get the mode for a list? If so, would the implementation be different depending on the item type?
floats, ints).While this function seems reliable, it is very slow on bigger lists:
from scipy.stats import mode
li = range(50000)
li[1] = 0
%timeit mode(li)
1 loops, best of 3: 7.55 s per loopIs there a better way to get the mode for a list? If so, would the implementation be different depending on the item type?
Solution
Originally I thought you must have been doing something wrong, but no: the
As long as your objects are hashable (which includes your three listed object types of strings, floats, and ints), then probably the simplest approach is to use collections.Counter and the
but
scipy.stats implementation of mode scales with the number of unique elements, and so will behave very badly for your test case.As long as your objects are hashable (which includes your three listed object types of strings, floats, and ints), then probably the simplest approach is to use collections.Counter and the
most_common method:In [33]: import scipy.stats
In [34]: li = range(50000); li[1] = 0
In [35]: scipy.stats.mode(li)
Out[35]: (array([ 0.]), array([ 2.]))
In [36]: timeit scipy.stats.mode(li)
1 loops, best of 3: 10.7 s per loopbut
In [37]: from collections import Counter
In [38]: Counter(li).most_common(1)
Out[38]: [(0, 2)]
In [39]: timeit Counter(li).most_common(1)
10 loops, best of 3: 34.1 ms per loopCode Snippets
In [33]: import scipy.stats
In [34]: li = range(50000); li[1] = 0
In [35]: scipy.stats.mode(li)
Out[35]: (array([ 0.]), array([ 2.]))
In [36]: timeit scipy.stats.mode(li)
1 loops, best of 3: 10.7 s per loopIn [37]: from collections import Counter
In [38]: Counter(li).most_common(1)
Out[38]: [(0, 2)]
In [39]: timeit Counter(li).most_common(1)
10 loops, best of 3: 34.1 ms per loopContext
StackExchange Code Review Q#20030, answer score: 5
Revisions (0)
No revisions yet.