patternpythonMinor
Computation of population patterns in simple agent-based model
Viewed 0 times
simplecomputationagentpatternsbasedpopulationmodel
Problem
I've modeled the population changes for a group of agents that inhabit a 2D lattice (with wrapped boundaries) made up of
As an inexperienced programmer, my current Python script is pretty inefficient, taking as long as 19 minutes(!!) for the given parameters. I suspect I'm relying too much on NumPy functions but cannot see clear ways to optimize it.
```
import numpy as np
import random
import math as mp
import matplotlib.pyplot as plt
import time
# Lattice parameters
M = 60
patches = np.random.choice(2, M * M).reshape(M, -1)
# Biological parameters
mu = 14.5
alpha = .1
sigma = 5.
def reproduction(a_):
da = np.floor(a_) # discretize parent locations
z = np.random.normal(0, 0.1, M**2).reshape(M,-1) # generate noise in reproduction
muloc = np.ones(len(a_)) # mean reproduction rate per location
for i in xrange(len(a_)):
muloc[i] = mu * da[i,0]**0.01 + z[da[i,0], da[i,1]] # grid-dependency
if muloc[i]<0: # mean reproduction rate cannot be negative
muloc[i]=0
return muloc
def dispersal(self):
return np.random.multivariate_normal([self[0], self[1]], [[sigma21, 0], [0, 1sigma2]])%M # Gaussian dispersal
def landed(offspring_list):
Inlist = [] # landed offspring list
og = np.zeros((M, M)) # lattice defined by the number of landed offsprings per grid
dl = np.floor(offspring_list) # discretize offspring locations
for i in range(len(offspring_list)):
if patches[dl[i,0],dl[i,1]] == 1:
Inlist.append(offspring_list[i])
return Inlist
# competition kernel
ker = np.zeros((M, M))
for i in range(M):
M x M number of grids. At each time step: - A parent reproduces offsprings that disperse independently -- the number depends on the grid location.
- The offsprings live if they land on a "patch" grid but are removed otherwise.
- Neighboring offsprings compete based on local density.
- Survived offsprings grows up, becoming parents the next time step.
As an inexperienced programmer, my current Python script is pretty inefficient, taking as long as 19 minutes(!!) for the given parameters. I suspect I'm relying too much on NumPy functions but cannot see clear ways to optimize it.
```
import numpy as np
import random
import math as mp
import matplotlib.pyplot as plt
import time
# Lattice parameters
M = 60
patches = np.random.choice(2, M * M).reshape(M, -1)
# Biological parameters
mu = 14.5
alpha = .1
sigma = 5.
def reproduction(a_):
da = np.floor(a_) # discretize parent locations
z = np.random.normal(0, 0.1, M**2).reshape(M,-1) # generate noise in reproduction
muloc = np.ones(len(a_)) # mean reproduction rate per location
for i in xrange(len(a_)):
muloc[i] = mu * da[i,0]**0.01 + z[da[i,0], da[i,1]] # grid-dependency
if muloc[i]<0: # mean reproduction rate cannot be negative
muloc[i]=0
return muloc
def dispersal(self):
return np.random.multivariate_normal([self[0], self[1]], [[sigma21, 0], [0, 1sigma2]])%M # Gaussian dispersal
def landed(offspring_list):
Inlist = [] # landed offspring list
og = np.zeros((M, M)) # lattice defined by the number of landed offsprings per grid
dl = np.floor(offspring_list) # discretize offspring locations
for i in range(len(offspring_list)):
if patches[dl[i,0],dl[i,1]] == 1:
Inlist.append(offspring_list[i])
return Inlist
# competition kernel
ker = np.zeros((M, M))
for i in range(M):
Solution
I profiled your code using pycallgraph to find the bottlenecks. Two observations:
Python’s builtin structures are often a lot faster than numpy’s for simple data, so that’s always where I start when I’m trying to speed up some numpy code. The fact that
-
In
You need to make a few tweaks in
This means that every time you access
-
Running the profiler again,
I haven’t found anything that gets as good an improvement as fixing
-
Since the
-
Build
-
Running the profiler a third time, the
I haven’t looked into this in detail, but I think your random matrix is probably the cause of the problem. I think it’s slow to index, and that’s dragging you down. Consider converting it into a list of lists?
-
After that, I’d probably start looking through
That’s about as much as I want to do for now, but hopefully it gives you a few pointers in the right direction. The code is certainly faster than I left it, anyway.
Summary: don’t over-rely on numpy’s structures. Sometimes the Python builtin structures are as good as, or better. Use a profiler to determine what’s more appropriate.
Some other general optimisations and/or code tweaks that I'd suggest:
-
In your print statements, you often have extra spaces. When you print a series of strings separated by commas, Python always adds a space between them for you. Compare the output of these two lines:
and you should see the extra spaces go away.
-
The parameters seem somewhat scattered. Put all your variables right at the top of the file, so they’re easy to find and edit.
-
Rather than running code in the top-level, move it into a
This means that if the script is run directly, you get all the plotting as before, but you can also import functions from this file into another script. That’s useful for reusing code – it’s the best of both worlds.
-
Don't use
-
In your
-
Doing
- Over 80% of the time is spent in a numpy function called
ndarray.dispersal, so we need to find ways to cut down the use of numpy’s array functions.
- The most expensive user function is
competition(), so we should start there first.
Python’s builtin structures are often a lot faster than numpy’s for simple data, so that’s always where I start when I’m trying to speed up some numpy code. The fact that
ndarray is the main culprit strengthens the idea that overuse of numpy’s structures is the problem.-
In
competition, rather than declaring go with np.zeros, I create it as a list of lists:og = [[0] * M] * MYou need to make a few tweaks in
competition() to modify the way indices are handled; you need to drop in this line instead:og[int(dl[i,0])][int(dl[i,1])] += 1This means that every time you access
og, you’re using native Python lists rather than Numpy’s functions. This gave me a big speed boost.-
Running the profiler again,
competition() is still the main culprit. What else can we change?I haven’t found anything that gets as good an improvement as fixing
og, but here are some other small optimisations we can make:-
Since the
ker variable never changes, recalculating v1 every time we call competition() is wasteful. Pull the computation of v1 out of the function, and put it next to ker.-
Build
density as a list of lists, not a numpy array. Again, we make some small savings:density = [[1]] * len(dl)-
Running the profiler a third time, the
reproduction() function is pretty close to the time of competition(). What can we do here?I haven’t looked into this in detail, but I think your random matrix is probably the cause of the problem. I think it’s slow to index, and that’s dragging you down. Consider converting it into a list of lists?
-
After that, I’d probably start looking through
landed(), which is the next slowest user function. You can drop the variable og, as it goes unused. Not sure what else, if anything, you could do here.That’s about as much as I want to do for now, but hopefully it gives you a few pointers in the right direction. The code is certainly faster than I left it, anyway.
Summary: don’t over-rely on numpy’s structures. Sometimes the Python builtin structures are as good as, or better. Use a profiler to determine what’s more appropriate.
Some other general optimisations and/or code tweaks that I'd suggest:
-
In your print statements, you often have extra spaces. When you print a series of strings separated by commas, Python always adds a space between them for you. Compare the output of these two lines:
print 'simulation number ', s+1, ' of ', sim_num
print 'simulation number', s+1, 'of', sim_numand you should see the extra spaces go away.
-
The parameters seem somewhat scattered. Put all your variables right at the top of the file, so they’re easy to find and edit.
-
Rather than running code in the top-level, move it into a
main() function and put this at the bottom of the file:if __name__ == '__main__':
main()This means that if the script is run directly, you get all the plotting as before, but you can also import functions from this file into another script. That’s useful for reusing code – it’s the best of both worlds.
-
Don't use
self as a variable name in dispersal; by convention, that's used for classes, and it's best not to use it to avoid confusion.-
In your
dispersal function, the mean is the input of the function, and the covariance matrix is constant (as sigma is constant). I'd thus suggest rewriting the function this way:# Convention dictates that constants are in UPPERCASE_WITH_UNDERSCORES
M = 60
SIGMA = 5
COV = [[SIGMA ** 2, 0], [0, SIGMA ** 2]]
def dispersal(mean):
return np.random.multivariate_normal(mean=mean, cov=COV)-
Doing
import math as mp just causes unnecessary confusion. It’s fine to do import math.Code Snippets
og = [[0] * M] * Mog[int(dl[i,0])][int(dl[i,1])] += 1density = [[1]] * len(dl)print 'simulation number ', s+1, ' of ', sim_num
print 'simulation number', s+1, 'of', sim_numif __name__ == '__main__':
main()Context
StackExchange Code Review Q#90590, answer score: 4
Revisions (0)
No revisions yet.