patternpythonMinor

Improve performance of math function applied to arrays

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

arraysfunctionappliedimprovemathperformance

Problem

This block of code is already somewhat optimized thanks to some answers given over at Stack Overflow. In the last question I made (Improve performance of function without parallelization) the code was marginally improved but I later had to re-write a little bit of it and the changes proposed didn't apply anymore.

I'm looking at optimizing this block of code without resorting to parallelization, other than that I'm pretty much open to using any package out there.

I think this is perhaps the correct site to post this since it's really more of a question about how can I improve my code rather than something not working with it.

Here's the MWE (minimum working example):

import numpy as np
import timeit

def random_data(N):
    # Generate some random data.
    return np.random.uniform(0., 10., N)

# Data lists.
array1 = np.array([random_data(4) for _ in range(1000)])
array2 = np.array([random_data(4) for _ in range(2000)])

def func():
    lst = []

    for elem in array1:

        # Define factors.
        a_01, a_11 = max(elem[1], 1e-10), max(elem[3], 1e-10)
        a_02, a_12 = array2[:,1], array2[:,3]
        # Combine the factors defined above.
        a_0 = a_01**2 + a_02**2
        a_1 = a_11**2 + a_12**2
        Bs, Cs = -0.5/a_0, -0.5/a_1

        # Perform operations.
        A = 1./(np.sqrt(a_0*a_1))
        B = Bs*(elem[0]-array2[:,0])**2
        C = Cs*(elem[2]-array2[:,2])**2
        ABC = A*np.exp(B+C)

        lst.append(max(ABC.sum(), 1e-10))

    return lst

# Time function.
func_time = timeit.timeit(func, number=10000)
print func_time

Solution

Well, some things I noticed:

# Define factors.
    a_01, a_11 = max(elem[1], 1e-10), max(elem[3], 1e-10)
    a_02, a_12 = array2[:,1], array2[:,3]

a_02 and a_12 are static across the loop, no need to redefine them every time. a_01 and a_12 could be defined array-wise using np.maximum:

a_01 = np.maximum(array1[:,1], np.array(1e-10 for _ in xrange(len(array1))
a_11 = np.maximum(array1[:,3], np.array(1e-10 for _ in xrange(len(array1))

However, these are trivial improvements; function call overhead is often blamed for much in the Python optimisation world, but there is no way that removing 2n calls to max will improve things here. With that said, following the static data path further, we come across an instance of

a_0 = a_01**2 + a_02**2
a_1 = a_11**2 + a_12**2

If we instead define

a_022 = a_02**2
a_122 = a_12**2

at the beginning of the function, we can avoid n multiplications by replacing later occurrences of a_02**2 with a_022 (and likewise for a_12), which reduces the running time of the program by about 5% on my machine. That isn't too impressive, though I'm sure you should be able to knock that down a fair bit by doing more thing using numpy functions. However, my numpy skills evaporate when it comes to 2d arrays, so I can't help.

It is worth mentioning that the algorithm you are using is O(n^2) at least, so wouldn't get your hopes up too high. Otherwise, I'd recommend looking at Cython as this seems like a problem it would excel at, though you've mentioned that parallelism isn't an option, so I assume using C extensions won't be either. However, if it is, I think you will find great time savings there.

This was the final code I came up with:

def func2():
    lst = []
    # Define factors.
    a_02, a_12 = array2[:,1], array2[:,3]
    a_022 = a_02 ** 2
    a_122 = a_12 ** 2
    e10 = np.array([1e-10 for _ in xrange(len(array1))])
    a_01 = np.maximum(array1[:,1], e10)
    a_11 = np.maximum(array1[:,3], e10)
    for i, elem in enumerate(array1):
        # Combine the factors defined above.
        a_0 = a_01[i]**2 + a_022
        a_1 = a_11[i]**2 + a_122
        Bs, Cs = -0.5/a_0, -0.5/a_1
        # Perform operations.
        A = 1./(np.sqrt(a_0*a_1))
        B = Bs*(elem[0]-array2[:,0])**2
        C = Cs*(elem[2]-array2[:,2])**2
        ABC = A*np.exp(B+C)
        lst.append(max(ABC.sum(), 1e-10))
    return lst

N = 100
func_time = timeit.timeit(func, number=N)
print "Original\t", func_time / N
print "-" * 50

func2_time = timeit.timeit(func2, number=N)
print "Revision 1\t", func2_time / N
print "Improvement\t", str(int((func_time - func2_time) / func_time * 100)) + "%"
print "-" * 50

Code Snippets

# Define factors.
    a_01, a_11 = max(elem[1], 1e-10), max(elem[3], 1e-10)
    a_02, a_12 = array2[:,1], array2[:,3]

a_01 = np.maximum(array1[:,1], np.array(1e-10 for _ in xrange(len(array1))
a_11 = np.maximum(array1[:,3], np.array(1e-10 for _ in xrange(len(array1))

a_0 = a_01**2 + a_02**2
a_1 = a_11**2 + a_12**2

a_022 = a_02**2
a_122 = a_12**2

def func2():
    lst = []
    # Define factors.
    a_02, a_12 = array2[:,1], array2[:,3]
    a_022 = a_02 ** 2
    a_122 = a_12 ** 2
    e10 = np.array([1e-10 for _ in xrange(len(array1))])
    a_01 = np.maximum(array1[:,1], e10)
    a_11 = np.maximum(array1[:,3], e10)
    for i, elem in enumerate(array1):
        # Combine the factors defined above.
        a_0 = a_01[i]**2 + a_022
        a_1 = a_11[i]**2 + a_122
        Bs, Cs = -0.5/a_0, -0.5/a_1
        # Perform operations.
        A = 1./(np.sqrt(a_0*a_1))
        B = Bs*(elem[0]-array2[:,0])**2
        C = Cs*(elem[2]-array2[:,2])**2
        ABC = A*np.exp(B+C)
        lst.append(max(ABC.sum(), 1e-10))
    return lst

N = 100
func_time = timeit.timeit(func, number=N)
print "Original\t", func_time / N
print "-" * 50

func2_time = timeit.timeit(func2, number=N)
print "Revision 1\t", func2_time / N
print "Improvement\t", str(int((func_time - func2_time) / func_time * 100)) + "%"
print "-" * 50

Context

StackExchange Code Review Q#41993, answer score: 5

Revisions (0)

No revisions yet.