patternpythonMinor
Optimizing numpy gmean calculation
Viewed 0 times
gmeancalculationoptimizingnumpy
Problem
I'm trying to come up with an alternative to the
While these both outperform
Note: I'm testing this on arrays of approximately 5000 entries.
gmean implementation in scipy, because it is awkwardly slow. To that end I've been looking into alternate calculation methods and implementing them in numpy. My only issue is that a method that in the two methods I'm implementing, the one that I feel ought to be faster is in fact much slower. I feel like this is an issue with my implementation and would love advice on improving it.def fast_gmean(vector, chunk_size=1000):
base, exponent = np.frexp(vector)
exponent_sum = float(np.sum(exponent))
while len(base) > 1:
base = np.array_split(base, math.ceil(float(base.size)/chunk_size))
intermediates = np.array([np.prod(split) for split in base])
base, current_exponent = np.frexp(intermediates)
exponent_sum += np.sum(current_exponent)
return (base[0]**(1.0/vector.size)) * (2**(exponent_sum/vector.size))
def actually_fast_gmean(vector):
return np.exp(np.mean(np.log(vector)))While these both outperform
scipy's gmean implementation, the second method is about 33% faster than the first.Note: I'm testing this on arrays of approximately 5000 entries.
Solution
- Checking your claim
You claim that "these both outperform
scipy's gmean implementation", but I can't substantiate this. For example:>>> import numpy
>>> data = numpy.random.exponential(size=5000)
>>> from timeit import timeit
>>> timeit(lambda:fast_gmean(data), number=10000)
5.540040018968284
>>> timeit(lambda:actually_fast_gmean(data), number=10000)
1.4999530320055783
>>> from scipy.stats import gmean
>>> timeit(lambda:gmean(data), number=10000)
1.4939542019274086
So as far as I can tell, there's no significant difference in runtime between your
actually_fast_gmean and scipy.stats.gmean, and your fast_gmean is more than 3 times slower.So I think you need to give us more information. What's the basis for your claim about performance? What kind of test data are you using?
(Update: in comments it turned out that you were using
scipy.stats.mstats.gmean, which is a version of gmean specialized for masked arrays.)- Read the source!
If you look at the source code for
scipy.stats.gmean, you'll see that it's almost exactly the same as your actually_fast_gmean, except that it's more general (it takes dtype and axis arguments):def gmean(a, axis=0, dtype=None):
if not isinstance(a, np.ndarray): # if not an ndarray object attempt to convert it
log_a = np.log(np.array(a, dtype=dtype))
elif dtype: # Must change the default dtype allowing array type
if isinstance(a,np.ma.MaskedArray):
log_a = np.log(np.ma.asarray(a, dtype=dtype))
else:
log_a = np.log(np.asarray(a, dtype=dtype))
else:
log_a = np.log(a)
return np.exp(log_a.mean(axis=axis))So it's not surprising that these two functions have almost identical runtimes.
- Why
fast_gmeanis slow
Your strategy is to avoid calls to
log by performing arithmetic on the exponent and mantissa parts of the floating-point numbers.Very roughly speaking, for each element of the input, you avoid one call to each of
log and mean, and gain one call to each of frexp, sum, array_split and prod.>>> from numpy import log, mean, frexp, sum, array_split, prod
>>> for f in log, mean, frexp, sum, prod:
... print(f.__name__, timeit(lambda:f(data), number=10000))
log 1.0724926821421832
mean 0.3662677980028093
frexp 0.34479621006175876
sum 0.21649421006441116
prod 0.280590218026191
>>> timeit(lambda:array_split(data, 5), number=10000)
2.1635821380186826
So it's the call to
numpy.array_split that's costly. You could avoid this call and split the array yourself, like this:def fast_gmean2(vector, chunk_size=1000):
base, exponent = np.frexp(vector)
exponent_sum = np.sum(exponent)
while base.size > 1:
intermediates = []
for i in range(0, base.size, chunk_size):
intermediates.append(np.prod(base[i:i + chunk_size]))
base, current_exponent = np.frexp(np.array(intermediates))
exponent_sum += np.sum(current_exponent)
return base[0] ** (1.0/vector.size) * 2 ** (exponent_sum/vector.size)and this is roughly twice as fast as your version:
>>> timeit(lambda:fast_gmean2(data), number=10000)
2.585187505930662
but still about twice as slow as
scipy.stats.gmean, and that's because of the Python interpreter overhead. Numpy has a speed advantage whenever you can vectorize your operations so that they run on fixed-size datatypes in the Numpy core (which is implemented in C for speed). If you can't vectorize your operations, but have to loop over them in Python, then you pay a penalty.So let's vectorize that:
def fast_gmean3(vector, chunk_size=1000):
base, exponent = np.frexp(vector)
exponent_sum = np.sum(exponent)
while len(base) > chunk_size:
base = np.r_[base, np.ones(-len(base) % chunk_size)]
intermediates = base.reshape(chunk_size, -1).prod(axis=0)
base, current_exponent = np.frexp(intermediates)
exponent_sum += np.sum(current_exponent)
if len(base) > 1:
base, current_exponent = np.frexp([base.prod()])
exponent_sum += np.sum(current_exponent)
return base[0] ** (1.0/vector.size) * 2 ** (exponent_sum/vector.size)For arrays of the size we've been testing (about 5000), this is a little slower than
fast_gmean2:>>> timeit(lambda:fast_gmean3(data), number=10000)
2.8020136120030656
But for larger arrays it beats
gmean:>>> bigdata = np.random.exponential(size=1234567)
>>> timeit(lambda:gmean(bigdata), number=100)
3.192410137009574
>>> timeit(lambda:fast_gmean3(bigdata), number=100)
2.3945167789934203
So the fastest implementation depends on the length of the array.
- Other comments on
fast_gmean
-
There's no docstring. What does this function do and how do I call it? What value should I pass in for the
chunk_size argument?-
It's critical that
chunk_size is not too large, otherwise the call to prod could underflow and the result will be incorrect. So there needs to be a check that the vCode Snippets
def gmean(a, axis=0, dtype=None):
if not isinstance(a, np.ndarray): # if not an ndarray object attempt to convert it
log_a = np.log(np.array(a, dtype=dtype))
elif dtype: # Must change the default dtype allowing array type
if isinstance(a,np.ma.MaskedArray):
log_a = np.log(np.ma.asarray(a, dtype=dtype))
else:
log_a = np.log(np.asarray(a, dtype=dtype))
else:
log_a = np.log(a)
return np.exp(log_a.mean(axis=axis))def fast_gmean2(vector, chunk_size=1000):
base, exponent = np.frexp(vector)
exponent_sum = np.sum(exponent)
while base.size > 1:
intermediates = []
for i in range(0, base.size, chunk_size):
intermediates.append(np.prod(base[i:i + chunk_size]))
base, current_exponent = np.frexp(np.array(intermediates))
exponent_sum += np.sum(current_exponent)
return base[0] ** (1.0/vector.size) * 2 ** (exponent_sum/vector.size)def fast_gmean3(vector, chunk_size=1000):
base, exponent = np.frexp(vector)
exponent_sum = np.sum(exponent)
while len(base) > chunk_size:
base = np.r_[base, np.ones(-len(base) % chunk_size)]
intermediates = base.reshape(chunk_size, -1).prod(axis=0)
base, current_exponent = np.frexp(intermediates)
exponent_sum += np.sum(current_exponent)
if len(base) > 1:
base, current_exponent = np.frexp([base.prod()])
exponent_sum += np.sum(current_exponent)
return base[0] ** (1.0/vector.size) * 2 ** (exponent_sum/vector.size)Context
StackExchange Code Review Q#40067, answer score: 7
Revisions (0)
No revisions yet.