patternpythonMinor
Class used for stochastic epidemic simulations
Viewed 0 times
stochasticepidemicusedsimulationsforclass
Problem
I've developed a class used for some epidemic simulations I'm doing. Individuals are 'S' (susceptible), 'I' (infected), or 'R' (recovered). These are standard abbreviations in the research community. I assume some calculations are done in advance to determine who will be infected (and recover) when. There will be many different ways I do this calculation depending on the population studied.
Some questions I have:
-
Should I worry that I'm passing in some dicts that could later be edited by the user? What's the best way to prevent this?
-
Any advice on how to make this clean? I'm moderately familiar with Python, but it's self-taught, and this is the first time I'm seriously working with classes.
```
import scipy
import pylab as py
from collections import Counter
class SIREpidemic(object):
"""
This will have the basic commands we want for any variety of SIR epidemic.
When an epidemic is initialized, we will have already calculated the time of
infection and recovery for each individual. These will be passed in as dicts.
Each individual, if infected will have individual.inftime as the time of infection
and individual.rectime as time of recovery.
"""
def __init__(self, infectionTime, recoveryTime, N):
'''Process the infectionTime and RecoveryTime to determine who is susceptible
when. infectionTime[individual] gives time of infection of individual.
recoveryTime[individual] gives recovery time of individual (the keys for both
lists must match). N is the total population size (may include individuals
that are never infected.'''
statusTypes = ['S', 'I', 'R']
self._timeSeries = {status:scipy.array([]) for status in statusTypes}
self._timeSeries['t'] = scipy.array([])
self.N = N
self._infectionTime = infectionTime
self._recoveryTime = recoveryTime
infTimes = [infectionTime[individual] for individual in infectionTime.keys()]
Some questions I have:
-
Should I worry that I'm passing in some dicts that could later be edited by the user? What's the best way to prevent this?
-
Any advice on how to make this clean? I'm moderately familiar with Python, but it's self-taught, and this is the first time I'm seriously working with classes.
```
import scipy
import pylab as py
from collections import Counter
class SIREpidemic(object):
"""
This will have the basic commands we want for any variety of SIR epidemic.
When an epidemic is initialized, we will have already calculated the time of
infection and recovery for each individual. These will be passed in as dicts.
Each individual, if infected will have individual.inftime as the time of infection
and individual.rectime as time of recovery.
"""
def __init__(self, infectionTime, recoveryTime, N):
'''Process the infectionTime and RecoveryTime to determine who is susceptible
when. infectionTime[individual] gives time of infection of individual.
recoveryTime[individual] gives recovery time of individual (the keys for both
lists must match). N is the total population size (may include individuals
that are never infected.'''
statusTypes = ['S', 'I', 'R']
self._timeSeries = {status:scipy.array([]) for status in statusTypes}
self._timeSeries['t'] = scipy.array([])
self.N = N
self._infectionTime = infectionTime
self._recoveryTime = recoveryTime
infTimes = [infectionTime[individual] for individual in infectionTime.keys()]
Solution
Python coding conventions
There's a few places in the code where you have the following:
The Pythonic way to check against
Note that this works because
There's a few other minor formatting issues that aren't considered idiomatic Python code style (for example no space after comma). The PEP8 document outlines some standard python coding conventions. I would recommend giving that a read.
Constants
There's a few examples of unnamed constants floating around in the code, for example
Also
Generator expressions
If you don't need a list you don't have to create it:
Here you have to make 2 lists in memory before you use the
Given that you don't actually need the entire list to be generated here I would instead opt to use a generator expression:
This allows you to count all the frequencies without having to maintain the whole list in memory at once. Given that you don't actually ever need that list this ends up saving you memory and is especially beneficial if those dictionaries are large.
Then you build a sorted list and remove duplicates.
This work was already done with the
If you have a very large number of duplicated elements this optimization could save you quite a bit of processing.
Documentation
There's a few parts in the code that are not immediately obvious and would benefit from a docstring. In particular it's not clear exactly what these are doing:
The
First the indentation and formatting is off here, being that whitespace in python is significant having the 8 character wide space here is a bit odd, just make it all consistent. Adding an extra named variable might help readability here. Here's what it looks like with those changes:
There's a few places in the code where you have the following:
if y == None and x == None:The Pythonic way to check against
None is to use is:if y is None and x is None:
if fid is not None:Note that this works because
None is always the same object.There's a few other minor formatting issues that aren't considered idiomatic Python code style (for example no space after comma). The PEP8 document outlines some standard python coding conventions. I would recommend giving that a read.
Constants
There's a few examples of unnamed constants floating around in the code, for example
't' is in a bunch of different places. Creating named variables for these is generally speaking a good thing for maintainability. Also
statusTypes = ['S', 'I', 'R'] isn't limited to one particular instance of the class so I would move that outside of the __init__.Generator expressions
If you don't need a list you don't have to create it:
infTimes = [infectionTime[individual] for individual in infectionTime.keys()]
recTimes = [recoveryTime[individual] for individual in infectionTime.keys()]
infTimeCounter = Counter(infTimes)
recTimeCounter = Counter(recTimes)Here you have to make 2 lists in memory before you use the
Counter.Given that you don't actually need the entire list to be generated here I would instead opt to use a generator expression:
infTimes = (infectionTime[individual] for individual in infectionTime.keys())
recTimes = (recoveryTime[individual] for individual in infectionTime.keys())
infTimeCounter = Counter(infTimes)
recTimeCounter = Counter(recTimes)This allows you to count all the frequencies without having to maintain the whole list in memory at once. Given that you don't actually ever need that list this ends up saving you memory and is especially beneficial if those dictionaries are large.
Then you build a sorted list and remove duplicates.
self._timeSeries['t'] = scipy.array(sorted(set(infTimes+recTimes)))This work was already done with the
Counter though, you could do something like this instead:combined_count = infTimeCounter + recTimeCounter
self._timeSeries['t'] = scipy.array(sorted(combined_count.elements()))If you have a very large number of duplicated elements this optimization could save you quite a bit of processing.
Documentation
There's a few parts in the code that are not immediately obvious and would benefit from a docstring. In particular it's not clear exactly what these are doing:
def size(self):
return self._timeSeries['R'][-1]
def initial_size(self):
return 1-self._timeseries['S'][0]plot methodThe
plot method could really do with an explanation of what x, y and fid are in the docstring. The docstring could also be better formatted as it's very wide.def plot(self,x=None, y=None, fid=None):
if y == None and x == None:
self.plot('t', 'S')
self.plot('t', 'I')
self.plot('t', 'R')First the indentation and formatting is off here, being that whitespace in python is significant having the 8 character wide space here is a bit odd, just make it all consistent. Adding an extra named variable might help readability here. Here's what it looks like with those changes:
def plot(self, x=None, y=None, fid=None):
if y is None and x is None:
for status in statusTypes:
self.plot('t', status)
elif y is None:
y_label = x
x_label = 't'
elif x is None:
y_label = y
x_label = 't'
else:
y_label = y
x_label = x
if fid is None:
py.figure(fid)Code Snippets
if y == None and x == None:if y is None and x is None:
if fid is not None:infTimes = [infectionTime[individual] for individual in infectionTime.keys()]
recTimes = [recoveryTime[individual] for individual in infectionTime.keys()]
infTimeCounter = Counter(infTimes)
recTimeCounter = Counter(recTimes)infTimes = (infectionTime[individual] for individual in infectionTime.keys())
recTimes = (recoveryTime[individual] for individual in infectionTime.keys())
infTimeCounter = Counter(infTimes)
recTimeCounter = Counter(recTimes)self._timeSeries['t'] = scipy.array(sorted(set(infTimes+recTimes)))Context
StackExchange Code Review Q#73790, answer score: 8
Revisions (0)
No revisions yet.