HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Finding the mode of an array/iterable

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
thearraymodeiterablefinding

Problem

I'm currently using SciPy's mode function to find the most occurring item in different iterable objects. I like the mode function because it works on every object type I've thrown at it (strings, floats, ints).

While this function seems reliable, it is very slow on bigger lists:

from scipy.stats import mode

li = range(50000)
li[1] = 0

%timeit mode(li)
1 loops, best of 3: 7.55 s per loop


Is there a better way to get the mode for a list? If so, would the implementation be different depending on the item type?

Solution

Originally I thought you must have been doing something wrong, but no: the scipy.stats implementation of mode scales with the number of unique elements, and so will behave very badly for your test case.

As long as your objects are hashable (which includes your three listed object types of strings, floats, and ints), then probably the simplest approach is to use collections.Counter and the most_common method:

In [33]: import scipy.stats

In [34]: li = range(50000); li[1] = 0

In [35]: scipy.stats.mode(li)
Out[35]: (array([ 0.]), array([ 2.]))

In [36]: timeit scipy.stats.mode(li)
1 loops, best of 3: 10.7 s per loop


but

In [37]: from collections import Counter

In [38]: Counter(li).most_common(1)
Out[38]: [(0, 2)]

In [39]: timeit Counter(li).most_common(1)
10 loops, best of 3: 34.1 ms per loop

Code Snippets

In [33]: import scipy.stats

In [34]: li = range(50000); li[1] = 0

In [35]: scipy.stats.mode(li)
Out[35]: (array([ 0.]), array([ 2.]))

In [36]: timeit scipy.stats.mode(li)
1 loops, best of 3: 10.7 s per loop
In [37]: from collections import Counter

In [38]: Counter(li).most_common(1)
Out[38]: [(0, 2)]

In [39]: timeit Counter(li).most_common(1)
10 loops, best of 3: 34.1 ms per loop

Context

StackExchange Code Review Q#20030, answer score: 5

Revisions (0)

No revisions yet.