patternpythonMinor
Epidemic simulation over a large population
Viewed 0 times
simulationepidemiclargepopulationover
Problem
This is a follow-up to my question about epidemic simulation. The accepted answer works, but also mentions that I should look at modifying the external_function_call. Here's the complete working code, but it still takes quite a long time to run - in fact, it's slower than what I originally posted.
Here are the first few lines from the profiler, assuming a population size 100 (it's actually 625, which makes the code extremely slow):
I'm very sure I'm not using lists as I should, which may be a part of the problem - checkEqual is checking whether all elements in the list are equal to 0.
And here's the code. The purpose of changing the list element to 0 was to keep the spot in the list, but exclude that element from the list of susceptible individuals at time t. I'm also not convinced I should define dist1 as I do, in the loop, but I can't see another way to do it. This code is far too slow, even with the improvements suggested in the epidemic simulation question. How can I make it faster?
```
import numpy as np
import scipy
from scipy import spatial
import json
import itertools
import collections
MAX_FAILED_ATTEMPS = 10
def create_pop(meanval, sigmaval, pop):
x_pos, y_pos = np.random.multivariate_normal(meanval, sig
Here are the first few lines from the profiler, assuming a population size 100 (it's actually 625, which makes the code extremely slow):
11185402 function calls (11184367 primitive calls) in 25.228 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.108 0.108 25.229 25.229 20Dec_for_codereview.py:1()
1 0.000 0.000 0.530 0.530 20Dec_for_codereview.py:10(create_pop)
6585 0.019 0.000 0.019 0.000 20Dec_for_codereview.py:26(checkEqual)
748 10.670 0.014 16.705 0.022 20Dec_for_codereview.py:29(powerlaw_epidemic)
58 0.002 0.000 0.013 0.000 20Dec_for_codereview.py:66(inf_per_count_time)
6321 0.005 0.000 0.005 0.000 20Dec_for_codereview.py:70()
1 0.008 0.008 16.731 16.731 20Dec_for_codereview.py:74(write_powerlaw_epidemics)I'm very sure I'm not using lists as I should, which may be a part of the problem - checkEqual is checking whether all elements in the list are equal to 0.
And here's the code. The purpose of changing the list element to 0 was to keep the spot in the list, but exclude that element from the list of susceptible individuals at time t. I'm also not convinced I should define dist1 as I do, in the loop, but I can't see another way to do it. This code is far too slow, even with the improvements suggested in the epidemic simulation question. How can I make it faster?
```
import numpy as np
import scipy
from scipy import spatial
import json
import itertools
import collections
MAX_FAILED_ATTEMPS = 10
def create_pop(meanval, sigmaval, pop):
x_pos, y_pos = np.random.multivariate_normal(meanval, sig
Solution
Indeed, you seem to not use lists as you should. You seems to not use dictionary as you shoud either. In fact, you picked a too complicated data structure for the task at hand.
The aim of your dictionary is to store the date at which people got infected based on the index of the person. The aim of your list is to store a list of uninfected indexes. This is way too contrived as you could associate indexes to a value using just a list (and
If you use only a single list, storing for each index the date at which the person (for this index) got infected, you can simplify your computation a lot. On top of that, when computing the people infected at time
In the following rewrite, I use a list of infection dates, the value 0 meaning the person never got infected yet:
A few things to note:
-
Since you use
You may be able to use
Now you may notice that I return an array where you returned a dictionnary, this means you have to adapt your calling code to handle the change.
Using various tips that I already gave in my previous answers, you can end up with the following code:
```
import csv
import itertools
import collections
import argparse
import numpy as np
import scipy.spatial
MAX_FAILED_ATTEMPTS = 50
EpidemyStatistics = collections.namedtuple(
'EpidemyStatistics',
'susceptibility transmissibility infectious epsilon count infected')
def create_population(pop, meanval=None, sigmaval=None):
if meanval is None:
meanval = np.array([0., 0.])
if sigmaval is None:
sigmaval = np.array([[1., 0.], [0., 1.]])
xyarray = np.random.multivariate_normal(meanval, sigmaval, pop)
pdistance = scipy.spatial.distance.pdist(xyarray)
full_mat = scipy.spatial.distance.squareform(pdistance)
with open('20Dec_population.txt', 'w') as f:
np.savetxt(f, xyarray, fmt=['%f', '%f'])
return full_mat
def powerlaw_epidemic(pop, susc, trans, inf_period, eps, full_mat):
"""Compute a list of infection date for each people in the population.
A value of 0 meaning the person never got infected during the simulation.
"""
infection_dates = np.zeros(pop, dtype=int)
initial_patient = np.random.randint(1, pop)
infection_dates[initial_patient] = 1 # First got infected on day 1
for day in itertools.count(2):
if infection_dates.all():
# Stop the si
The aim of your dictionary is to store the date at which people got infected based on the index of the person. The aim of your list is to store a list of uninfected indexes. This is way too contrived as you could associate indexes to a value using just a list (and
enumerate).If you use only a single list, storing for each index the date at which the person (for this index) got infected, you can simplify your computation a lot. On top of that, when computing the people infected at time
t, you include contagious people that were computed in the range \$[t - \text{inf_period}; t[\$, and to do so, you explicitly exclude people contaminated at time t. Instead, you could compute the list of infectious people before iterating over susceptible to avoid to:- Exclude those contaminated at time
t;
- Recomputing the same list over and over for each people in the population.
In the following rewrite, I use a list of infection dates, the value 0 meaning the person never got infected yet:
def powerlaw_epidemic(pop, susc, trans, inf_period, eps, full_mat):
"""Compute a list of infection date for each people in the population.
A value of 0 meaning the person never got infected during the simulation.
"""
infection_dates = np.zeros(pop, dtype=int)
initial_patient = np.random.randint(1, pop)
infection_dates[initial_patient] = 1 # First got infected on day 1
for day in itertools.count(2):
if infection_dates.all():
# Stop the simulation if everyone was infected
return infection_dates
still_virulent = max(day - inf_period, 1)
infectious_people = infection_dates >= still_virulent
if not infectious_people.any():
# Stop the simulation if noone is infectious anymore
return infection_dates
# Compute newly infected persons
for person, infected_at in enumerate(infection_dates):
if not infected_at:
dist = full_mat[person, infectious_people] ** (-trans)
probinf = 1 - np.exp(-susc * dist.sum() + eps)
if probinf >= np.random.uniform(0, 1):
infection_dates[person] = dayA few things to note:
- You don't need to use a test to differentiate the first day from the other ones, just do the special case before the loop to avoid loosing time in a useless test in the loop.
-
Since you use
numpy in various places, you can improve the computation a bit, especially extracting out specific indexes out of full_mat or computing the indexes of infectious_people.You may be able to use
numpy features to improve the update of infection_dates (instead of using a for loop), but I don't know numpy enough to provide them.Now you may notice that I return an array where you returned a dictionnary, this means you have to adapt your calling code to handle the change.
inf_per_count_time is easy to adapt as it only counts the number of people infected on a given day: this calls for collections.Counter. But you will also need to adapt your condition to know if a simulation was "virulent enough". Better write a function for that:def is_virulent_enough(simulation, infected=10, duration=10):
return (simulation > 0).sum() >= infected and simulation.max() >= durationcreate_pop could also be simplified as zip and .T cancel each other; making some re-creation of data superfluous.Using various tips that I already gave in my previous answers, you can end up with the following code:
```
import csv
import itertools
import collections
import argparse
import numpy as np
import scipy.spatial
MAX_FAILED_ATTEMPTS = 50
EpidemyStatistics = collections.namedtuple(
'EpidemyStatistics',
'susceptibility transmissibility infectious epsilon count infected')
def create_population(pop, meanval=None, sigmaval=None):
if meanval is None:
meanval = np.array([0., 0.])
if sigmaval is None:
sigmaval = np.array([[1., 0.], [0., 1.]])
xyarray = np.random.multivariate_normal(meanval, sigmaval, pop)
pdistance = scipy.spatial.distance.pdist(xyarray)
full_mat = scipy.spatial.distance.squareform(pdistance)
with open('20Dec_population.txt', 'w') as f:
np.savetxt(f, xyarray, fmt=['%f', '%f'])
return full_mat
def powerlaw_epidemic(pop, susc, trans, inf_period, eps, full_mat):
"""Compute a list of infection date for each people in the population.
A value of 0 meaning the person never got infected during the simulation.
"""
infection_dates = np.zeros(pop, dtype=int)
initial_patient = np.random.randint(1, pop)
infection_dates[initial_patient] = 1 # First got infected on day 1
for day in itertools.count(2):
if infection_dates.all():
# Stop the si
Code Snippets
def powerlaw_epidemic(pop, susc, trans, inf_period, eps, full_mat):
"""Compute a list of infection date for each people in the population.
A value of 0 meaning the person never got infected during the simulation.
"""
infection_dates = np.zeros(pop, dtype=int)
initial_patient = np.random.randint(1, pop)
infection_dates[initial_patient] = 1 # First got infected on day 1
for day in itertools.count(2):
if infection_dates.all():
# Stop the simulation if everyone was infected
return infection_dates
still_virulent = max(day - inf_period, 1)
infectious_people = infection_dates >= still_virulent
if not infectious_people.any():
# Stop the simulation if noone is infectious anymore
return infection_dates
# Compute newly infected persons
for person, infected_at in enumerate(infection_dates):
if not infected_at:
dist = full_mat[person, infectious_people] ** (-trans)
probinf = 1 - np.exp(-susc * dist.sum() + eps)
if probinf >= np.random.uniform(0, 1):
infection_dates[person] = daydef is_virulent_enough(simulation, infected=10, duration=10):
return (simulation > 0).sum() >= infected and simulation.max() >= durationimport csv
import itertools
import collections
import argparse
import numpy as np
import scipy.spatial
MAX_FAILED_ATTEMPTS = 50
EpidemyStatistics = collections.namedtuple(
'EpidemyStatistics',
'susceptibility transmissibility infectious epsilon count infected')
def create_population(pop, meanval=None, sigmaval=None):
if meanval is None:
meanval = np.array([0., 0.])
if sigmaval is None:
sigmaval = np.array([[1., 0.], [0., 1.]])
xyarray = np.random.multivariate_normal(meanval, sigmaval, pop)
pdistance = scipy.spatial.distance.pdist(xyarray)
full_mat = scipy.spatial.distance.squareform(pdistance)
with open('20Dec_population.txt', 'w') as f:
np.savetxt(f, xyarray, fmt=['%f', '%f'])
return full_mat
def powerlaw_epidemic(pop, susc, trans, inf_period, eps, full_mat):
"""Compute a list of infection date for each people in the population.
A value of 0 meaning the person never got infected during the simulation.
"""
infection_dates = np.zeros(pop, dtype=int)
initial_patient = np.random.randint(1, pop)
infection_dates[initial_patient] = 1 # First got infected on day 1
for day in itertools.count(2):
if infection_dates.all():
# Stop the simulation if everyone was infected
return infection_dates
still_virulent = max(day - inf_period, 1)
infectious_people = infection_dates >= still_virulent
if not infectious_people.any():
# Stop the simulation if noone is infectious anymore
return infection_dates
# Compute newly infected persons
for person, infected_at in enumerate(infection_dates):
if not infected_at:
dist = full_mat[person, infectious_people] ** (-trans)
probinf = 1 - np.exp(-susc * dist.sum() + eps)
if probinf >= np.random.uniform(0, 1):
infection_dates[person] = day
def is_virulent_enough(simulation, infected=10, duration=10):
return (simulation > 0).sum() >= infected and simulation.max() >= duration
def write_powerlaw_epidemics(population, susc, trans, inf_period, eps, repetitions, full_mat):
epidemies = []
count = 0
parameters_product = itertools.product(susc, trans, inf_period, eps)
for susceptibility, transmissibility, infectious_period, epsilon in parameters_product:
while True:
for rep in range(repetitions):
for _ in range(MAX_FAILED_ATTEMPTS):
infection_dates = powerlaw_epidemic(
population, susceptibility, transmissibility,
infectious_period, epsilon, full_mat)
if is_virulent_enough(infection_dates):
epidemies.append(EpidemyStatistics(
susceptibility,
transmissibility,
infectious_period,
epContext
StackExchange Code Review Q#150414, answer score: 2
Revisions (0)
No revisions yet.