HiveBrain v1.2.0
Get Started
← Back to all entries
principlepythonMinor

Code Review of small scientific project, particuarly array vs. list perform

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
particuarlyarrayscientificprojectperformsmallcodelistreview

Problem

I'm new to python but not to programming in general. This is my first project beyond the basics.

I was wondering if I could get some feedback on the code in general, in particular on any bad Java/C++ habits I might be dragging along.

I have some specific questions regarding performance of numpy's array vs. lists. As you can see from this diff, I have rewritten some of the code to use the array class, which seems more "pythonic" (code seems to read better) but I find that it is sometimes almost twice as slow. Is this an inevitable performance hit for the abstraction, or am I doing something obviously wrong?

To get a proper working copy of the code:

git clone git://github.com/hemmer/pyDLA.git


And to checkout the array version use (afterwards):

git checkout numpyarray_slow


Any advice, however small, is greatly appreciated!

EDIT: Here is the source of the older (faster code). For the changes I made to use numpy arrays, please see the diff above.

```
#!/usr/bin/env python
import time
from math import pi, sqrt, cos, sin
from random import choice, random, seed

from pylab import pcolormesh, axes, show
from numpy import zeros, int32, arange

twopi = 2 * pi # 2 pi for choosing starting position

hits = 0 # how many seeds have stuck
birthradius = 5 # starting radius for walkers
deathradius = 10 # radius to kill off walkers
maxradius = -1 # the extent of the growth
numParticles = 1000 # how many walkers to release
seed(42) # fixed seed for debugging

L = 500 # lattice goes from -L : L
size = (2 * L) + 1 # so total lattice width is 2L + 1

# preallocate and initialise centre point as "seed"
lattice = zeros((size, size), dtype=int32)
lattice[L, L] = -1

# possible (relative) nearest neighbour sites
nnsteps = ((0, 1), (0, -1), (1, 0), (-1, 0))

# returns whether site pos = (x, y) has
# an occupied nearest-neighbour site
def nnOccupie

Solution


  • Use of global variables is frowned upon. You should probably define a class, and hold the data as attributes on it.



  • Don't put logic at the module level, i.e. your main for loop. That is better placed inside of a main() function



  • In your diff, you use python's sum rather then numpy's sum. when dealing with numpy array's numpy's sum will be much faster. Basically, you should never use python math functions on numpy arrays. Always use the numpy equivalents.



  • Numpy is really not designed for having small arrays of two elements. Its really focused on having larger arrays so you are going to have trouble getting the speed benefits.



  • You should avoid writing loops over numpy arrays, instead you should use numpy's vector operation features



To get an effective use of numpy, you probably need if possible to process multiple particles in parallel so that you are operating on large arrays. The numpy's processing abilities will be able to help.

EDIT: Quick Sample of numpy vector operations

angles = numpy.random.rand(numParticles) * twopi
positions = numpy.empty((numParticles, 2))
positions[0,:] = numpy.sin(angles) * birthradius
positions[1,:] = numpy.cos(angles) * birthradius


This constructs a 2D array of positions. No python loops are used, instead operations over arrays are used. I'm not seeing a good way to use it in your case. Ordinarilly, I'd use them to run all particles at the same time. But you have dependency between your different runs.

More thoughts:

  • By keeping an array of bools marking the positions next to "hit" locations, you can avoid checking the neighbours in nnOccupied. Its cheaper to update the "next_to" array when a hit is detected.



  • Using two coordinate systems 0-based and L-based, confusion results. Standardise on one.



  • Setting isDead and using break is redundant. Pick one and stick with it



  • Don't if expr: return True else return False just return it



  • Why is nnsteps redecared in registerHit?



EDIT: My rewrite of your code

```
#!/usr/bin/env python
import time

from pylab import pcolormesh, axes, show
import numpy as np

L = 500 # lattice goes from -L : L
SIZE = (2 * L) + 1 # so total lattice width is 2L + 1
NNSTEPS = np.array([
(0, 1),
(0, -1),
(1, 0),
(-1, 0)
])
CHUNK_SIZE = 1000

class Simulation(object):
def __init__(self, num_particles):
self.hits = 0
self.birthradius = 5
self.deathradius = 10
self.maxradius = -1

self.lattice = np.zeros((SIZE, SIZE), dtype=np.int32)
self.lattice[L, L] = -1
# initialize center point as the seed

self.hit_zone = np.zeros((SIZE, SIZE), dtype=bool)
self.hit_zone[L, L] = True
self.hit_zone[ NNSTEPS[:,0] + L, NNSTEPS[:,1] + L ] = True
# the hit_zone is all of the position next to places that have
# actually hit, these count as hits

self.num_particles = num_particles

def register_hit(self, pos):
neighbors = NNSTEPS + pos
self.hit_zone[neighbors[:,0], neighbors[:,1]] = True
# update hit_zone

# check if this "hit" extends the max radius
norm2 = (pos[0] - L)2 + (pos[1] - L)2
if norm2 > self.maxradius ** 2:
self.maxradius = int(np.sqrt(norm2))
self.birthradius = min(self.maxradius + 5, L)
self.deathradius = min(self.maxradius + 20, L)

self.hits += 1
self.lattice[pos] = self.hits

def run(self):
for particle in xrange(self.num_particles):
# find angle on [0, 2pi)
angle = np.random.random() 2 np.pi
# and convert to a starting position, pos = (x, y),
# on a circle of radius "birthradius" around the centre seed
pos = (np.sin(angle) self.birthradius + L, np.cos(angle) self.birthradius + L)

while True:
moves = np.random.randint(0, 4, CHUNK_SIZE) # pick the move numbers
moves = np.cumsum(NNSTEPS[moves], axis = 0) # grab the move and do a sum

# add starting position to all of that
moves[:,0] += pos[0]
moves[:,1] += pos[1]

# calculate distance to center for all the points
from_center = moves - L
distances_to_center = from_center[:,0]2 + from_center[:,1]2
alive = distances_to_center < self.deathradius ** 2
alive = np.minimum.accumulate(alive)

particle_hits = self.hit_zone[moves[:,0], moves[:,1]]

if np.any(particle_hits):
first_hit = particle_hits.nonzero()[0][0]
if alive[first_hit]:
pos = tuple(moves[first_hit])
self.register_hit(pos)
break
else:
if np.all(alive):
pos = tuple(moves[-1])
else:

Code Snippets

angles = numpy.random.rand(numParticles) * twopi
positions = numpy.empty((numParticles, 2))
positions[0,:] = numpy.sin(angles) * birthradius
positions[1,:] = numpy.cos(angles) * birthradius
#!/usr/bin/env python
import time

from pylab import pcolormesh, axes, show
import numpy as np

L = 500                     # lattice goes from -L : L
SIZE = (2 * L) + 1          # so total lattice width is 2L + 1
NNSTEPS = np.array([
    (0, 1), 
    (0, -1), 
    (1, 0), 
    (-1, 0)
])
CHUNK_SIZE = 1000

class Simulation(object):
    def __init__(self, num_particles):
        self.hits = 0
        self.birthradius = 5
        self.deathradius = 10
        self.maxradius = -1

        self.lattice = np.zeros((SIZE, SIZE), dtype=np.int32)
        self.lattice[L, L] = -1
        # initialize center point as the seed

        self.hit_zone = np.zeros((SIZE, SIZE), dtype=bool)
        self.hit_zone[L, L] = True
        self.hit_zone[ NNSTEPS[:,0] + L, NNSTEPS[:,1] + L ] = True
        # the hit_zone is all of the position next to places that have
        # actually hit, these count as hits

        self.num_particles = num_particles

    def register_hit(self, pos):
        neighbors = NNSTEPS + pos
        self.hit_zone[neighbors[:,0], neighbors[:,1]] = True
        # update hit_zone

        # check if this "hit" extends the max radius
        norm2 = (pos[0] - L)**2 + (pos[1] - L)**2
        if norm2 > self.maxradius ** 2:
            self.maxradius = int(np.sqrt(norm2))
            self.birthradius = min(self.maxradius + 5, L)
            self.deathradius = min(self.maxradius + 20, L)

        self.hits += 1
        self.lattice[pos] = self.hits


    def run(self):
        for particle in xrange(self.num_particles):
            # find angle on [0, 2pi)
            angle = np.random.random() * 2 * np.pi
            # and convert to a starting position, pos = (x, y),
            # on a circle of radius "birthradius" around the centre seed
            pos = (np.sin(angle) * self.birthradius + L, np.cos(angle) * self.birthradius + L)

            while True:
                moves = np.random.randint(0, 4, CHUNK_SIZE) # pick the move numbers
                moves = np.cumsum(NNSTEPS[moves], axis = 0) # grab the move and do a sum

                # add starting position to all of that
                moves[:,0] += pos[0]
                moves[:,1] += pos[1]

                # calculate distance to center for all the points
                from_center = moves - L
                distances_to_center = from_center[:,0]**2 + from_center[:,1]**2
                alive = distances_to_center < self.deathradius ** 2
                alive = np.minimum.accumulate(alive)

                particle_hits = self.hit_zone[moves[:,0], moves[:,1]]

                if np.any(particle_hits):
                    first_hit = particle_hits.nonzero()[0][0]
                    if alive[first_hit]:
                        pos = tuple(moves[first_hit])
                        self.register_hit(pos)
                        break
                else:
                    if np.all(alive):
                        pos = tuple(moves[-1])
                    else:
                  
moves = np.random.randint(0, 4, CHUNK_SIZE) # pick the move numbers
moves = np.cumsum(NNSTEPS[moves], axis = 0) # grab the move and do a sum
# add starting position to all of that
                moves[:,0] += pos[0]
                moves[:,1] += pos[1]

Context

StackExchange Code Review Q#4336, answer score: 2

Revisions (0)

No revisions yet.