HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Computation of population patterns in simple agent-based model

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
simplecomputationagentpatternsbasedpopulationmodel

Problem

I've modeled the population changes for a group of agents that inhabit a 2D lattice (with wrapped boundaries) made up of M x M number of grids. At each time step:

  • A parent reproduces offsprings that disperse independently -- the number depends on the grid location.



  • The offsprings live if they land on a "patch" grid but are removed otherwise.



  • Neighboring offsprings compete based on local density.



  • Survived offsprings grows up, becoming parents the next time step.



As an inexperienced programmer, my current Python script is pretty inefficient, taking as long as 19 minutes(!!) for the given parameters. I suspect I'm relying too much on NumPy functions but cannot see clear ways to optimize it.

```
import numpy as np
import random
import math as mp
import matplotlib.pyplot as plt
import time

# Lattice parameters
M = 60
patches = np.random.choice(2, M * M).reshape(M, -1)

# Biological parameters
mu = 14.5
alpha = .1
sigma = 5.

def reproduction(a_):
da = np.floor(a_) # discretize parent locations
z = np.random.normal(0, 0.1, M**2).reshape(M,-1) # generate noise in reproduction
muloc = np.ones(len(a_)) # mean reproduction rate per location
for i in xrange(len(a_)):
muloc[i] = mu * da[i,0]**0.01 + z[da[i,0], da[i,1]] # grid-dependency
if muloc[i]<0: # mean reproduction rate cannot be negative
muloc[i]=0
return muloc

def dispersal(self):
return np.random.multivariate_normal([self[0], self[1]], [[sigma21, 0], [0, 1sigma2]])%M # Gaussian dispersal

def landed(offspring_list):
Inlist = [] # landed offspring list
og = np.zeros((M, M)) # lattice defined by the number of landed offsprings per grid

dl = np.floor(offspring_list) # discretize offspring locations
for i in range(len(offspring_list)):
if patches[dl[i,0],dl[i,1]] == 1:
Inlist.append(offspring_list[i])
return Inlist

# competition kernel
ker = np.zeros((M, M))
for i in range(M):

Solution

I profiled your code using pycallgraph to find the bottlenecks. Two observations:

  • Over 80% of the time is spent in a numpy function called ndarray.dispersal, so we need to find ways to cut down the use of numpy’s array functions.



  • The most expensive user function is competition(), so we should start there first.



Python’s builtin structures are often a lot faster than numpy’s for simple data, so that’s always where I start when I’m trying to speed up some numpy code. The fact that ndarray is the main culprit strengthens the idea that overuse of numpy’s structures is the problem.

-
In competition, rather than declaring go with np.zeros, I create it as a list of lists:

og = [[0] * M] * M


You need to make a few tweaks in competition() to modify the way indices are handled; you need to drop in this line instead:

og[int(dl[i,0])][int(dl[i,1])] += 1


This means that every time you access og, you’re using native Python lists rather than Numpy’s functions. This gave me a big speed boost.

-
Running the profiler again, competition() is still the main culprit. What else can we change?

I haven’t found anything that gets as good an improvement as fixing og, but here are some other small optimisations we can make:

-
Since the ker variable never changes, recalculating v1 every time we call competition() is wasteful. Pull the computation of v1 out of the function, and put it next to ker.

-
Build density as a list of lists, not a numpy array. Again, we make some small savings:

density = [[1]] * len(dl)


-
Running the profiler a third time, the reproduction() function is pretty close to the time of competition(). What can we do here?

I haven’t looked into this in detail, but I think your random matrix is probably the cause of the problem. I think it’s slow to index, and that’s dragging you down. Consider converting it into a list of lists?

-
After that, I’d probably start looking through landed(), which is the next slowest user function. You can drop the variable og, as it goes unused. Not sure what else, if anything, you could do here.

That’s about as much as I want to do for now, but hopefully it gives you a few pointers in the right direction. The code is certainly faster than I left it, anyway.

Summary: don’t over-rely on numpy’s structures. Sometimes the Python builtin structures are as good as, or better. Use a profiler to determine what’s more appropriate.

Some other general optimisations and/or code tweaks that I'd suggest:

-
In your print statements, you often have extra spaces. When you print a series of strings separated by commas, Python always adds a space between them for you. Compare the output of these two lines:

print 'simulation number ', s+1, ' of ', sim_num
print 'simulation number', s+1, 'of', sim_num


and you should see the extra spaces go away.

-
The parameters seem somewhat scattered. Put all your variables right at the top of the file, so they’re easy to find and edit.

-
Rather than running code in the top-level, move it into a main() function and put this at the bottom of the file:

if __name__ == '__main__':
    main()


This means that if the script is run directly, you get all the plotting as before, but you can also import functions from this file into another script. That’s useful for reusing code – it’s the best of both worlds.

-
Don't use self as a variable name in dispersal; by convention, that's used for classes, and it's best not to use it to avoid confusion.

-
In your dispersal function, the mean is the input of the function, and the covariance matrix is constant (as sigma is constant). I'd thus suggest rewriting the function this way:

# Convention dictates that constants are in UPPERCASE_WITH_UNDERSCORES 
M = 60
SIGMA = 5
COV = [[SIGMA ** 2, 0], [0, SIGMA ** 2]]

def dispersal(mean):
    return np.random.multivariate_normal(mean=mean, cov=COV)


-
Doing import math as mp just causes unnecessary confusion. It’s fine to do import math.

Code Snippets

og = [[0] * M] * M
og[int(dl[i,0])][int(dl[i,1])] += 1
density = [[1]] * len(dl)
print 'simulation number ', s+1, ' of ', sim_num
print 'simulation number', s+1, 'of', sim_num
if __name__ == '__main__':
    main()

Context

StackExchange Code Review Q#90590, answer score: 4

Revisions (0)

No revisions yet.