patternpythonMinor
Calculating the push or pull force of a molecule at a vertex
Viewed 0 times
thepullforcemoleculevertexcalculatingpush
Problem
The basic idea of the code is to calculate the push or pull force at a vertex, given the number of "push causing molecules" and "pull causing molecules" at a polygon vertex.
The code thus mainly performs arithmetic (including the function it calls,
Here is my code, in plain Python (note that
```
def calculate_point_forces(self, PushMolecules, PushMolecules_iPlus1, PushMolecules_iMinus1, PullMolecules, PullMolecules_iPlus1, PullMolecules_iMinus1, point_index, num_nodes, polygon, polygon_compression, point_interpolygon_contact, push_vector, pull_vector, a1, a2):
max_push_force = self.max_push_force
max_pull_force = self.max_pull_force
tangent_factor = self.tangent_factor
rac_force_exponent = self.rac_force_exponent
rho_force_exponent = self.rho_force_exponent
avg_PushMolecules_plus1 = (PushMolecules + PushMolecules_iPlus1)*0.5
avg_PullMolecules_plus1 = (PullMolecules + PullMolecules_iPlus1)*0.5
avg_PushMolecules_minus1 = (PushMolecules + PushMolecules_iMinus1)*0.5
avg_PullMolecules_minus1 = (PullMolecules + PullMolecules_iMinus1)*0.5
if self.enable_compression_stiffening == True:
if polygon_compression PullMolecules:
Fpush = int(not point_interpolygon_contact)max_push_force(1 - (1/(1 + (PushMolecules/a2)**rac_force_exponent)))*push_vector
Fpull = 0
Fe_pull = 0
elif PushMolecules < PullMolecules:
pull_magnitude = max_pull_force*(1 - (1/(1 + (PullMolecules/a1)**rho_force_exponent)))
Fpull = (1-tangent_factor)pull_magnitudepull_vector
Fe_pull = 0.5tangent_factorpull_magnitude
Fpush = 0
The code thus mainly performs arithmetic (including the function it calls,
calculate_edge_force), and has a couple of if-statements. The function is called many times since it is required in order to compute the derivatives, which is later used by scipy.odeint for integration. Here is my code, in plain Python (note that
import numpy as np has been put in somewhere higher up in the file, and that the functions are inside a class, hence there are references to this as self.):```
def calculate_point_forces(self, PushMolecules, PushMolecules_iPlus1, PushMolecules_iMinus1, PullMolecules, PullMolecules_iPlus1, PullMolecules_iMinus1, point_index, num_nodes, polygon, polygon_compression, point_interpolygon_contact, push_vector, pull_vector, a1, a2):
max_push_force = self.max_push_force
max_pull_force = self.max_pull_force
tangent_factor = self.tangent_factor
rac_force_exponent = self.rac_force_exponent
rho_force_exponent = self.rho_force_exponent
avg_PushMolecules_plus1 = (PushMolecules + PushMolecules_iPlus1)*0.5
avg_PullMolecules_plus1 = (PullMolecules + PullMolecules_iPlus1)*0.5
avg_PushMolecules_minus1 = (PushMolecules + PushMolecules_iMinus1)*0.5
avg_PullMolecules_minus1 = (PullMolecules + PullMolecules_iMinus1)*0.5
if self.enable_compression_stiffening == True:
if polygon_compression PullMolecules:
Fpush = int(not point_interpolygon_contact)max_push_force(1 - (1/(1 + (PushMolecules/a2)**rac_force_exponent)))*push_vector
Fpull = 0
Fe_pull = 0
elif PushMolecules < PullMolecules:
pull_magnitude = max_pull_force*(1 - (1/(1 + (PullMolecules/a1)**rho_force_exponent)))
Fpull = (1-tangent_factor)pull_magnitudepull_vector
Fe_pull = 0.5tangent_factorpull_magnitude
Fpush = 0
Solution
First of all you probably should type your numpy arrays but be warned, whether you use the
In this case, your code doesn't actually use any array functions that would benefit from the arrays being properly typed so you'd gain nothing in the mere act of typing the arrays. The overhead of calling a python function is significant so you might benefit from not using the numpy array functions, if the numpy arrays are short and if the functionality is straightforward to reproduce as cython code using array indexing (fast) and if the function is called many times. The profiler should tell you this stuff.
I can see some low-hanging fruit in the above code, take this line, mentioned to be a major offender:
What happens here, is
What you would do is rewrite that to eliminate the python function call (and thus all the intermediate python objects). As it happens, if you use the python built-in 'max' on typed variables, cython is smart enough to eliminate the python call entirely:
That will automatically become optimal inline c code for calculating the max of the 3 variables (Cython actually unrolls it as if/else statements)
The next low-hanging fruit, and another stated offender is this:
In some cases Cython optimizes away tuple-unpacking, but it doesn't in the case of function calls. So what is going to happen is cython will create an intermediate tuple (a python object), pack stuff into the tuple (Again, converting to python objects, as tuples can't hold c types), then unpack the tuple and convert the python objects to their c types. Again this should be turned into something that translates readily to c. Unfortunately in this case there is no faster better cleaner syntax, you would probably be best served by making a 'pair' struct and manually packing and unpacking it, something like:
Using a struct is by no means the only option here, but it's going to be faster than using a small array. You could also use the C++ 'pair' template which would be just as fast as a struct but it involves bringing in C++. For that matter you could also pass in the floats by pointers. None of these are clearly superior in either speed or elegance, but all will compile down to clean c code.
I imagine if you root out those big ghoulies of python calls you'll get a significant speed improvement. After that it's a matter of using cython -a on the .pyx file. The generated html file shows 'unoptimized' lines in yellow/orange, you can click on a line to see the generated c code. From there you can figure out how to eliminate the python calls - if it is worth it.
np.ndarray notation, or the memory view notation, only limited operations occur at the speed of c, that is indexing, 'shape', and a very few others. With a memory view you can only use these fast operations, with a ndarray type you can access all the python operations too but they still occur at the speed of python, not c.In this case, your code doesn't actually use any array functions that would benefit from the arrays being properly typed so you'd gain nothing in the mere act of typing the arrays. The overhead of calling a python function is significant so you might benefit from not using the numpy array functions, if the numpy arrays are short and if the functionality is straightforward to reproduce as cython code using array indexing (fast) and if the function is called many times. The profiler should tell you this stuff.
I can see some low-hanging fruit in the above code, take this line, mentioned to be a major offender:
Fe_pull = np.max([Fe_pull_minus1, Fe_pull_plus1, Fe_pull])What happens here, is
np.max is a call to a python function of a python module, the list will be constructed as a python list and the floats will be converted to python floats because a python list can't hold c floats. Then that code will all run at the speed of python. np.max will then unpack all that stuff and the result of np.max will be a python float which is converted to a c float. This will be slow!. In fact it may even be slower than python, because it involves creating extra temporary python objects.What you would do is rewrite that to eliminate the python function call (and thus all the intermediate python objects). As it happens, if you use the python built-in 'max' on typed variables, cython is smart enough to eliminate the python call entirely:
Fe_pull = max(Fe_pull_minus1, Fe_pull_plus1, Fe_pull)That will automatically become optimal inline c code for calculating the max of the 3 variables (Cython actually unrolls it as if/else statements)
The next low-hanging fruit, and another stated offender is this:
Fe_plus, Fe_plus_ = self.calculate_edge_force( ...In some cases Cython optimizes away tuple-unpacking, but it doesn't in the case of function calls. So what is going to happen is cython will create an intermediate tuple (a python object), pack stuff into the tuple (Again, converting to python objects, as tuples can't hold c types), then unpack the tuple and convert the python objects to their c types. Again this should be turned into something that translates readily to c. Unfortunately in this case there is no faster better cleaner syntax, you would probably be best served by making a 'pair' struct and manually packing and unpacking it, something like:
#At the top level
cdef struct float_pair:
float a
float b
# In the class method definition
cdef floar_pair calculate_edge_force(...
...
cdef float_pair result
result.a = ...
result.b = ...
return result
# In your function
cdef float_pair result = self.calculate_edge_force( ...
Fe_plus = result.a
Fe_plus_ = result.bUsing a struct is by no means the only option here, but it's going to be faster than using a small array. You could also use the C++ 'pair' template which would be just as fast as a struct but it involves bringing in C++. For that matter you could also pass in the floats by pointers. None of these are clearly superior in either speed or elegance, but all will compile down to clean c code.
I imagine if you root out those big ghoulies of python calls you'll get a significant speed improvement. After that it's a matter of using cython -a on the .pyx file. The generated html file shows 'unoptimized' lines in yellow/orange, you can click on a line to see the generated c code. From there you can figure out how to eliminate the python calls - if it is worth it.
Code Snippets
Fe_pull = np.max([Fe_pull_minus1, Fe_pull_plus1, Fe_pull])Fe_pull = max(Fe_pull_minus1, Fe_pull_plus1, Fe_pull)Fe_plus, Fe_plus_ = self.calculate_edge_force( ...#At the top level
cdef struct float_pair:
float a
float b
# In the class method definition
cdef floar_pair calculate_edge_force(...
...
cdef float_pair result
result.a = ...
result.b = ...
return result
# In your function
cdef float_pair result = self.calculate_edge_force( ...
Fe_plus = result.a
Fe_plus_ = result.bContext
StackExchange Code Review Q#64040, answer score: 4
Revisions (0)
No revisions yet.