HiveBrain v1.2.0
Get Started
← Back to all entries
snippetpythonMinor

How can I optimize this Monte Carlo simulation running at 10,000,000 iterations?

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
thismontecaniterationssimulation000runningoptimizehowcarlo

Problem

I am writing this Monte Carlo simulation and I am facing this issue when running the code at 10,000,000 iterations. here is the code:

import random as rnd
from time import time

#get user input on number of iterations
numOfIterations = raw_input('Enter the number of iterations: ')
numOfIterations = int(numOfIterations)

start = time()

#initialize bag (44 green, 20 blue, 15 yellow,  11 red, 2 white, 1 black
#a counter
#and question counter
bag = 44*'g'+ 20*'b' + 15*'y' + 11*'r' + 2*'w' + 'k'
counter = {'g':0, 'b':0,'y':0 ,'r':0,'w':0,'k':0}
question = {'one':0,'two':0,'three':0,'four':0,'five':0}

for i in range(0,numOfIterations):
  for j in xrange(0,5):
    draw = rnd.sample(bag,5)
    for x in draw: counter[x]+=1
  if counter['w'] >0 and counter['k'] >0: question['one']+=1
  if counter['b'] > counter['r']: question['two']+=1
  if counter['b'] > counter['y']: question['three']+=1
  if counter['y'] > counter['r']: question['four']+=1
  if counter['g'] < (counter['b']+counter['y']+counter['r']+counter['w']+counter['k']): question['five']+=1
  for k in counter: counter[k] = 0

p1 = float(question['one'])/float(numOfIterations)
p2 = float(question['two'])/float(numOfIterations)
p3 = float(question['three'])/float(numOfIterations)
p4 = float(question['four'])/float(numOfIterations)
p5 = float(question['five'])/float(numOfIterations)

print 'Q1 \t Q2 \t Q3 \t Q4 \t Q5'
print str(p1)+'\t'+str(p2)+'\t'+str(p3)+'\t'+str(p4)+'\t'+str(p5)

end = time()

print 'it took ' +str(end-start)+ ' seconds'


any suggestions/criticism would be appreciated.

Solution

import random as rnd


I dislike abbreviation like this, they make the code harder to read

from time import time

#get user input on number of iterations
numOfIterations = raw_input('Enter the number of iterations: ')
numOfIterations = int(numOfIterations)


Any reason you didn't combine these two lines?

start = time()

#initialize bag (44 green, 20 blue, 15 yellow,  11 red, 2 white, 1 black
#a counter
#and question counter
bag = 44*'g'+ 20*'b' + 15*'y' + 11*'r' + 2*'w' + 'k'
counter = {'g':0, 'b':0,'y':0 ,'r':0,'w':0,'k':0}
question = {'one':0,'two':0,'three':0,'four':0,'five':0}


Looking up your data by strings all the time is going to be somewhat slower. Instead, I'd suggest you keep lists and store the data that way.

for i in range(0,numOfIterations):


Given that numOfIterations will be very large, its probably a good idea to use xrange here.

for j in xrange(0,5):


You should generally put logic inside a function. That is especially true for any sort of loop as it will run faster in a function.

draw = rnd.sample(bag,5)
    for x in draw: counter[x]+=1


I dislike putting the contents of the loop on the same line. I think it makes it harder to read.

if counter['w'] >0 and counter['k'] >0: question['one']+=1
  if counter['b'] > counter['r']: question['two']+=1
  if counter['b'] > counter['y']: question['three']+=1
  if counter['y'] > counter['r']: question['four']+=1
  if counter['g'] < (counter['b']+counter['y']+counter['r']+counter['w']+counter['k']): question['five']+=1
  for k in counter: counter[k] = 0

p1 = float(question['one'])/float(numOfIterations)
p2 = float(question['two'])/float(numOfIterations)
p3 = float(question['three'])/float(numOfIterations)
p4 = float(question['four'])/float(numOfIterations)
p5 = float(question['five'])/float(numOfIterations)


Don't create five separate variables, create a list. Also, if you add the line from __future__ import division at the beginning of the file then dividing two ints will produce a float. Then you don't need to convert them to floats here.

print 'Q1 \t Q2 \t Q3 \t Q4 \t Q5'
print str(p1)+'\t'+str(p2)+'\t'+str(p3)+'\t'+str(p4)+'\t'+str(p5)


See if you had p1 a list, this would be much easier

end = time()

print 'it took ' +str(end-start)+ ' seconds'


For speed improvements you want to look at using numpy. It allows implementing efficient operations over arrays.

In this precise case I'd use a multinomial distribution and solve the problem analytically rather then using monte carlo.

Code Snippets

import random as rnd
from time import time

#get user input on number of iterations
numOfIterations = raw_input('Enter the number of iterations: ')
numOfIterations = int(numOfIterations)
start = time()

#initialize bag (44 green, 20 blue, 15 yellow,  11 red, 2 white, 1 black
#a counter
#and question counter
bag = 44*'g'+ 20*'b' + 15*'y' + 11*'r' + 2*'w' + 'k'
counter = {'g':0, 'b':0,'y':0 ,'r':0,'w':0,'k':0}
question = {'one':0,'two':0,'three':0,'four':0,'five':0}
for i in range(0,numOfIterations):
for j in xrange(0,5):

Context

StackExchange Code Review Q#6311, answer score: 8

Revisions (0)

No revisions yet.