patternpythonMinor

Optimizing numpy gmean calculation

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

gmeancalculationoptimizingnumpy

Problem

I'm trying to come up with an alternative to the gmean implementation in scipy, because it is awkwardly slow. To that end I've been looking into alternate calculation methods and implementing them in numpy. My only issue is that a method that in the two methods I'm implementing, the one that I feel ought to be faster is in fact much slower. I feel like this is an issue with my implementation and would love advice on improving it.

def fast_gmean(vector, chunk_size=1000):
    base, exponent = np.frexp(vector)
    exponent_sum = float(np.sum(exponent))
    while len(base) > 1:
        base = np.array_split(base, math.ceil(float(base.size)/chunk_size))
        intermediates = np.array([np.prod(split) for split in base])
        base, current_exponent = np.frexp(intermediates)
        exponent_sum += np.sum(current_exponent)
    return (base[0]**(1.0/vector.size)) * (2**(exponent_sum/vector.size))

def actually_fast_gmean(vector):
    return np.exp(np.mean(np.log(vector)))

While these both outperform scipy's gmean implementation, the second method is about 33% faster than the first.

Note: I'm testing this on arrays of approximately 5000 entries.

Solution

Checking your claim

You claim that "these both outperform scipy's gmean implementation", but I can't substantiate this. For example:

>>> import numpy
>>> data = numpy.random.exponential(size=5000)
>>> from timeit import timeit
>>> timeit(lambda:fast_gmean(data), number=10000)
5.540040018968284
>>> timeit(lambda:actually_fast_gmean(data), number=10000)
1.4999530320055783
>>> from scipy.stats import gmean
>>> timeit(lambda:gmean(data), number=10000)
1.4939542019274086

So as far as I can tell, there's no significant difference in runtime between your actually_fast_gmean and scipy.stats.gmean, and your fast_gmean is more than 3 times slower.

So I think you need to give us more information. What's the basis for your claim about performance? What kind of test data are you using?

(Update: in comments it turned out that you were using scipy.stats.mstats.gmean, which is a version of gmean specialized for masked arrays.)

Read the source!

If you look at the source code for scipy.stats.gmean, you'll see that it's almost exactly the same as your actually_fast_gmean, except that it's more general (it takes dtype and axis arguments):

def gmean(a, axis=0, dtype=None):
    if not isinstance(a, np.ndarray):  # if not an ndarray object attempt to convert it
        log_a = np.log(np.array(a, dtype=dtype))
    elif dtype:  # Must change the default dtype allowing array type
        if isinstance(a,np.ma.MaskedArray):
            log_a = np.log(np.ma.asarray(a, dtype=dtype))
        else:
            log_a = np.log(np.asarray(a, dtype=dtype))
    else:
        log_a = np.log(a)
    return np.exp(log_a.mean(axis=axis))

So it's not surprising that these two functions have almost identical runtimes.

Why fast_gmean is slow

Your strategy is to avoid calls to log by performing arithmetic on the exponent and mantissa parts of the floating-point numbers.

Very roughly speaking, for each element of the input, you avoid one call to each of log and mean, and gain one call to each of frexp, sum, array_split and prod.

>>> from numpy import log, mean, frexp, sum, array_split, prod
>>> for f in log, mean, frexp, sum, prod:
...     print(f.__name__, timeit(lambda:f(data), number=10000))
log 1.0724926821421832
mean 0.3662677980028093
frexp 0.34479621006175876
sum 0.21649421006441116
prod 0.280590218026191
>>> timeit(lambda:array_split(data, 5), number=10000)
2.1635821380186826

So it's the call to numpy.array_split that's costly. You could avoid this call and split the array yourself, like this:

def fast_gmean2(vector, chunk_size=1000):
    base, exponent = np.frexp(vector)
    exponent_sum = np.sum(exponent)
    while base.size > 1:
        intermediates = []
        for i in range(0, base.size, chunk_size):
            intermediates.append(np.prod(base[i:i + chunk_size]))
        base, current_exponent = np.frexp(np.array(intermediates))
        exponent_sum += np.sum(current_exponent)
    return base[0] ** (1.0/vector.size) * 2 ** (exponent_sum/vector.size)

and this is roughly twice as fast as your version:

>>> timeit(lambda:fast_gmean2(data), number=10000)
2.585187505930662

but still about twice as slow as scipy.stats.gmean, and that's because of the Python interpreter overhead. Numpy has a speed advantage whenever you can vectorize your operations so that they run on fixed-size datatypes in the Numpy core (which is implemented in C for speed). If you can't vectorize your operations, but have to loop over them in Python, then you pay a penalty.

So let's vectorize that:

def fast_gmean3(vector, chunk_size=1000):
    base, exponent = np.frexp(vector)
    exponent_sum = np.sum(exponent)
    while len(base) > chunk_size:
        base = np.r_[base, np.ones(-len(base) % chunk_size)]
        intermediates = base.reshape(chunk_size, -1).prod(axis=0)
        base, current_exponent = np.frexp(intermediates)
        exponent_sum += np.sum(current_exponent)
    if len(base) > 1:
        base, current_exponent = np.frexp([base.prod()])
        exponent_sum += np.sum(current_exponent)
    return base[0] ** (1.0/vector.size) * 2 ** (exponent_sum/vector.size)

For arrays of the size we've been testing (about 5000), this is a little slower than fast_gmean2:

>>> timeit(lambda:fast_gmean3(data), number=10000)
2.8020136120030656

But for larger arrays it beats gmean:

>>> bigdata = np.random.exponential(size=1234567)
>>> timeit(lambda:gmean(bigdata), number=100)
3.192410137009574
>>> timeit(lambda:fast_gmean3(bigdata), number=100)
2.3945167789934203

So the fastest implementation depends on the length of the array.

Other comments on fast_gmean

-
There's no docstring. What does this function do and how do I call it? What value should I pass in for the chunk_size argument?

-
It's critical that chunk_size is not too large, otherwise the call to prod could underflow and the result will be incorrect. So there needs to be a check that the v

Code Snippets

def gmean(a, axis=0, dtype=None):
    if not isinstance(a, np.ndarray):  # if not an ndarray object attempt to convert it
        log_a = np.log(np.array(a, dtype=dtype))
    elif dtype:  # Must change the default dtype allowing array type
        if isinstance(a,np.ma.MaskedArray):
            log_a = np.log(np.ma.asarray(a, dtype=dtype))
        else:
            log_a = np.log(np.asarray(a, dtype=dtype))
    else:
        log_a = np.log(a)
    return np.exp(log_a.mean(axis=axis))

def fast_gmean2(vector, chunk_size=1000):
    base, exponent = np.frexp(vector)
    exponent_sum = np.sum(exponent)
    while base.size > 1:
        intermediates = []
        for i in range(0, base.size, chunk_size):
            intermediates.append(np.prod(base[i:i + chunk_size]))
        base, current_exponent = np.frexp(np.array(intermediates))
        exponent_sum += np.sum(current_exponent)
    return base[0] ** (1.0/vector.size) * 2 ** (exponent_sum/vector.size)

def fast_gmean3(vector, chunk_size=1000):
    base, exponent = np.frexp(vector)
    exponent_sum = np.sum(exponent)
    while len(base) > chunk_size:
        base = np.r_[base, np.ones(-len(base) % chunk_size)]
        intermediates = base.reshape(chunk_size, -1).prod(axis=0)
        base, current_exponent = np.frexp(intermediates)
        exponent_sum += np.sum(current_exponent)
    if len(base) > 1:
        base, current_exponent = np.frexp([base.prod()])
        exponent_sum += np.sum(current_exponent)
    return base[0] ** (1.0/vector.size) * 2 ** (exponent_sum/vector.size)

Context

StackExchange Code Review Q#40067, answer score: 7

Revisions (0)

No revisions yet.