HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Lattice lotka-volterra

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
latticelotkavolterra

Problem

Can anyone comment on the efficiency of my code? I am new with Python using NumPy/SciPy and I am trying to program a lattice biological model where a certain (one and only one) species (represented by a nonzero element) occupies a lattice square (represented in the code by a 2D array). This species can reproduce to an adjacent and empty lattice square (represented by a zero element). (Quite similar to a lattice gas model.)

The part of the code that takes the longest time is the reproduction part (def repro) because it has to check whether a site is already chosen every time a random site is generated to avoid repetition (the first while loop in def repro). Can anyone suggest a faster way to do this?

I am thinking of creating a collection of indices (2D index) where I can choose a random element and delete it at the same time so that the while loop won't be necessary. How can I do this?

```
import numpy as np
import matplotlib.pyplot as plt
import random
import time
import csv

open('p14.4.csv','w').close()

t0 = time.time()

def dispose(matrix,dens,s,sp):
limit = int(nndens)
for num in range(limit):
i = np.random.randint(n)
j = np.random.randint(n)
while matrix[i,j] != 0:
i = np.random.randint(n)
j = np.random.randint(n)
matrix[i,j] = sp
return matrix

def mort(matrix,d):
RN = np.random.random((n,n))
casualty = (RN n-1) or (j+qn-1)):
p = np.random.randint(-1,2)
q = np.random.randint(-1,2)
if matrix[i+p,j+q] == 0 and b[spp-1] > np.random.random():
matrix[i+p,j+q] = matrix[i,j]
return matrix

def csv_writer(data,path,line):
with open(path, "a") as csv_file:
writer = csv.writer(csv_file, delimiter=',')
writer.writerow(data)
csv_file.close()
return path

#birth rate
spp = 10
B = [0.57710226,0.57722738,0.577233935,0.577120507,0.576885731,0.576528293,
0.576046935,0.575440457,0.574

Solution

There's quite a lot of code here, so for now I'm just going to look at the dispose function. You'll see that there's more than enough to say about this function for one answer.

  1. Review



-
There's no docstring. What does this function do? How do I call it? What values do I pass for parameters? What does it return? It's very hard to maintain code when you don't know what it is supposed to do. This doesn't have to be long-winded or laborious: a quick explanation like the following is perfectly adequate.

"""Choose n*n*dens positions in matrix with the value 0, uniformly at
random, and set them to sp. Return matrix."""


-
There's no good reason to abbreviate the parameter to dens. It only appears once in the function body, and we're not running out of letters, so call it density.

-
The function takes a parameter s but this is not used. Better to omit it.

-
The function uses the global variable n. It would be better to pass this in as a parameter, or even better, deduce it from the size of matrix. Presumably the intention is that matrix is always an n × n array, in which case you should write matrix.size instead of n * n.

-
The name dispose is vague and does not convey much to me. I think the name populate would be better.

-
The function always returns matrix, which is useless, as the caller already knows about matrix.

-
The name matrix is slightly misleading, as Numpy has a matrix class but this parameter does not belong to it! It's actually an ndarray. I would call the parameter array or maybe just a.

-
The name sp is rather too specific. Presumably it is short for "species", but really nothing about this function depends on this value representing a species. You can imagine other types of lattice simulation where the array entries represent molecules, grains of sand, people, etc. So for generality I'd name this parameter value.

-
The value 0 is arbitrary and so for extra generality it should be a function parameter (taking the default value 0). Imagine having a "speciation" operation in which you replace some fraction of a species with a descendant. It would be nice to be able to reuse this code.

-
If dens is greater than 1, then limit will be greater than the number of elements in the array, and so the function will run forever.

Even if dens is less than 1, there might still not be enough positions in the array with value 0, and so the function will run forever.

Perhaps you plan not to fill very much of the array, but even so it is a good idea to write robust code in case you change your plans or want to reuse this code in other programs.

-
The way that the positions are found is to repeatedly generate random coordinates and check to see if the position is 0. This is inefficient. Imagine that dens is 1, so that all positions must be set to sp. How long will this take, on average? This is the well known "coupon collector's problem" so the expected number of coordinates to be randomly generated is \$ n^2 \log n^2 \$.

Perhaps you plan not to fill very much of the array, but again, it is a good idea to write code with the right asymptotic complexity, in case you change your plans or want to reuse this code in other programs.

So it would be better to find the positions containing 0 (using numpy.flatnonzero), then choose a random sample of those positions without replacement (using numpy.random.choice), and then set all the positions.

  1. Performance



Putting all this together, I'd write the function like this:

def populate(array, density, value, original=0):
    """Choose array.size * density positions in array that are equal to
    original, uniformly at random, and set them to value.

    """
    a = array.reshape(-1)
    count = int(a.size * density)
    a[np.random.choice(np.flatnonzero(a == original), count, False)] = value


Note the use numpy.reshape to create a flattened view of array, so that we can use numpy.random.choice, which requires a 1-D array. We then use this view to update the original array.

However, this has unacceptably poor performance compared to dispose when density is small:

>>> a = np.zeros((1000, 1000))
>>> timeit(lambda:dispose(a, 0.01, None, 1), number=1)
0.0184222329990007
>>> timeit(lambda:populate(a, 0.01, 1), number=1)
0.2894051409966778


The reason for this poor performance is that numpy.random.choice computes a sample without replacement by permuting the whole array and using an initial prefix of the permutation. This is expensively wasteful when only a small part of the permutation is needed.

So what about sampling with replacement instead (which is fast), eliminating duplicates using numpy.unique and looping until we have enough choices (as in the original post)? Here's the implementation:

```
def populate(array, density, value, original=0):
"""Choose array.size * density positions in array that are equal to
original, uniformly at random, and

Code Snippets

"""Choose n*n*dens positions in matrix with the value 0, uniformly at
random, and set them to sp. Return matrix."""
def populate(array, density, value, original=0):
    """Choose array.size * density positions in array that are equal to
    original, uniformly at random, and set them to value.

    """
    a = array.reshape(-1)
    count = int(a.size * density)
    a[np.random.choice(np.flatnonzero(a == original), count, False)] = value
>>> a = np.zeros((1000, 1000))
>>> timeit(lambda:dispose(a, 0.01, None, 1), number=1)
0.0184222329990007
>>> timeit(lambda:populate(a, 0.01, 1), number=1)
0.2894051409966778
def populate(array, density, value, original=0):
    """Choose array.size * density positions in array that are equal to
    original, uniformly at random, and set them to value.

    """
    a = array.reshape(-1)
    count = int(a.size * density)
    while count:
        p = np.random.choice(a.size, count)
        p = np.unique(p[a[p] == original])
        a[p] = value
        count -= p.size
>>> a = np.zeros((1000,1000))
>>> timeit(lambda:populate(a, 0.01, 1), number=1)
0.0019981740042567253

Context

StackExchange Code Review Q#84742, answer score: 2

Revisions (0)

No revisions yet.