patternpythonMinor
Improve performance of math function applied to arrays
Viewed 0 times
arraysfunctionappliedimprovemathperformance
Problem
This block of code is already somewhat optimized thanks to some answers given over at Stack Overflow. In the last question I made (Improve performance of function without parallelization) the code was marginally improved but I later had to re-write a little bit of it and the changes proposed didn't apply anymore.
I'm looking at optimizing this block of code without resorting to parallelization, other than that I'm pretty much open to using any package out there.
I think this is perhaps the correct site to post this since it's really more of a question about how can I improve my code rather than something not working with it.
Here's the MWE (minimum working example):
I'm looking at optimizing this block of code without resorting to parallelization, other than that I'm pretty much open to using any package out there.
I think this is perhaps the correct site to post this since it's really more of a question about how can I improve my code rather than something not working with it.
Here's the MWE (minimum working example):
import numpy as np
import timeit
def random_data(N):
# Generate some random data.
return np.random.uniform(0., 10., N)
# Data lists.
array1 = np.array([random_data(4) for _ in range(1000)])
array2 = np.array([random_data(4) for _ in range(2000)])
def func():
lst = []
for elem in array1:
# Define factors.
a_01, a_11 = max(elem[1], 1e-10), max(elem[3], 1e-10)
a_02, a_12 = array2[:,1], array2[:,3]
# Combine the factors defined above.
a_0 = a_01**2 + a_02**2
a_1 = a_11**2 + a_12**2
Bs, Cs = -0.5/a_0, -0.5/a_1
# Perform operations.
A = 1./(np.sqrt(a_0*a_1))
B = Bs*(elem[0]-array2[:,0])**2
C = Cs*(elem[2]-array2[:,2])**2
ABC = A*np.exp(B+C)
lst.append(max(ABC.sum(), 1e-10))
return lst
# Time function.
func_time = timeit.timeit(func, number=10000)
print func_timeSolution
Well, some things I noticed:
However, these are trivial improvements; function call overhead is often blamed for much in the Python optimisation world, but there is no way that removing 2n calls to max will improve things here. With that said, following the static data path further, we come across an instance of
If we instead define
at the beginning of the function, we can avoid n multiplications by replacing later occurrences of
It is worth mentioning that the algorithm you are using is
This was the final code I came up with:
# Define factors.
a_01, a_11 = max(elem[1], 1e-10), max(elem[3], 1e-10)
a_02, a_12 = array2[:,1], array2[:,3]a_02 and a_12 are static across the loop, no need to redefine them every time. a_01 and a_12 could be defined array-wise using np.maximum:a_01 = np.maximum(array1[:,1], np.array(1e-10 for _ in xrange(len(array1))
a_11 = np.maximum(array1[:,3], np.array(1e-10 for _ in xrange(len(array1))However, these are trivial improvements; function call overhead is often blamed for much in the Python optimisation world, but there is no way that removing 2n calls to max will improve things here. With that said, following the static data path further, we come across an instance of
a_0 = a_01**2 + a_02**2
a_1 = a_11**2 + a_12**2If we instead define
a_022 = a_02**2
a_122 = a_12**2at the beginning of the function, we can avoid n multiplications by replacing later occurrences of
a_02**2 with a_022 (and likewise for a_12), which reduces the running time of the program by about 5% on my machine. That isn't too impressive, though I'm sure you should be able to knock that down a fair bit by doing more thing using numpy functions. However, my numpy skills evaporate when it comes to 2d arrays, so I can't help.It is worth mentioning that the algorithm you are using is
O(n^2) at least, so wouldn't get your hopes up too high. Otherwise, I'd recommend looking at Cython as this seems like a problem it would excel at, though you've mentioned that parallelism isn't an option, so I assume using C extensions won't be either. However, if it is, I think you will find great time savings there.This was the final code I came up with:
def func2():
lst = []
# Define factors.
a_02, a_12 = array2[:,1], array2[:,3]
a_022 = a_02 ** 2
a_122 = a_12 ** 2
e10 = np.array([1e-10 for _ in xrange(len(array1))])
a_01 = np.maximum(array1[:,1], e10)
a_11 = np.maximum(array1[:,3], e10)
for i, elem in enumerate(array1):
# Combine the factors defined above.
a_0 = a_01[i]**2 + a_022
a_1 = a_11[i]**2 + a_122
Bs, Cs = -0.5/a_0, -0.5/a_1
# Perform operations.
A = 1./(np.sqrt(a_0*a_1))
B = Bs*(elem[0]-array2[:,0])**2
C = Cs*(elem[2]-array2[:,2])**2
ABC = A*np.exp(B+C)
lst.append(max(ABC.sum(), 1e-10))
return lst
N = 100
func_time = timeit.timeit(func, number=N)
print "Original\t", func_time / N
print "-" * 50
func2_time = timeit.timeit(func2, number=N)
print "Revision 1\t", func2_time / N
print "Improvement\t", str(int((func_time - func2_time) / func_time * 100)) + "%"
print "-" * 50Code Snippets
# Define factors.
a_01, a_11 = max(elem[1], 1e-10), max(elem[3], 1e-10)
a_02, a_12 = array2[:,1], array2[:,3]a_01 = np.maximum(array1[:,1], np.array(1e-10 for _ in xrange(len(array1))
a_11 = np.maximum(array1[:,3], np.array(1e-10 for _ in xrange(len(array1))a_0 = a_01**2 + a_02**2
a_1 = a_11**2 + a_12**2a_022 = a_02**2
a_122 = a_12**2def func2():
lst = []
# Define factors.
a_02, a_12 = array2[:,1], array2[:,3]
a_022 = a_02 ** 2
a_122 = a_12 ** 2
e10 = np.array([1e-10 for _ in xrange(len(array1))])
a_01 = np.maximum(array1[:,1], e10)
a_11 = np.maximum(array1[:,3], e10)
for i, elem in enumerate(array1):
# Combine the factors defined above.
a_0 = a_01[i]**2 + a_022
a_1 = a_11[i]**2 + a_122
Bs, Cs = -0.5/a_0, -0.5/a_1
# Perform operations.
A = 1./(np.sqrt(a_0*a_1))
B = Bs*(elem[0]-array2[:,0])**2
C = Cs*(elem[2]-array2[:,2])**2
ABC = A*np.exp(B+C)
lst.append(max(ABC.sum(), 1e-10))
return lst
N = 100
func_time = timeit.timeit(func, number=N)
print "Original\t", func_time / N
print "-" * 50
func2_time = timeit.timeit(func2, number=N)
print "Revision 1\t", func2_time / N
print "Improvement\t", str(int((func_time - func2_time) / func_time * 100)) + "%"
print "-" * 50Context
StackExchange Code Review Q#41993, answer score: 5
Revisions (0)
No revisions yet.