patternMinor
Weighted Random Choice
Viewed 0 times
randomweightedchoice
Problem
I have a Cython function that takes a list of weights/probabilities (
For example:
Returns
The Cython code is:
I profiled some complex simulation code and found that about 50% of the execution time is spent in this function which is called approximately 3.5 million times. Because it is such a time hog, and such an important utility, I'd like to speed it up if possible.
One of the problems is that the input arrays are often very large with thousands of entries.
I tried various methods from Numpy in Cython and none of them come close to as fast as what I have written in Cython. I am a Cython newbie and am curious if there is something else I need to do to speed this up.
double) and returns a random index into the list.For example:
choose_one(np.array([0.1, 0.4, 0.2, 0.3]), 4)Returns
0 with probability 0.1, 1 with probability 0.4 etc.The Cython code is:
cdef extern from "stdlib.h":
double drand48()
void srand48(long int seedval)
@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
def choose_one(np.ndarray[double, ndim=1] weights, Py_ssize_t length):
cdef Py_ssize_t idx, i
cdef double cs
cdef double random
random = drand48()
cs = 0.0
i = 0
while cs < random and i < length:
cs += weights[i]
i += 1
return i - 1I profiled some complex simulation code and found that about 50% of the execution time is spent in this function which is called approximately 3.5 million times. Because it is such a time hog, and such an important utility, I'd like to speed it up if possible.
One of the problems is that the input arrays are often very large with thousands of entries.
I tried various methods from Numpy in Cython and none of them come close to as fast as what I have written in Cython. I am a Cython newbie and am curious if there is something else I need to do to speed this up.
Solution
The performance can be improved a bit more by declaring the function using
Another idea that might help is to check whether the random number is greater than 0.5 and, if it is, count backwards from the end of the array rather than the front (assuming that the values in the array always add up to one).
Also, if this function takes up half of the program's running time then it seems likely that the program could be restructured to use a different algorithm which calls this function less often, but this would probably require much more effort than the other suggestions.
cpdef (or cdef) instead of def (when the calling code is using Cython types too, check out http://docs.cython.org/src/userguide/language_basics.html#python-functions-vs-c-functions for more info).Another idea that might help is to check whether the random number is greater than 0.5 and, if it is, count backwards from the end of the array rather than the front (assuming that the values in the array always add up to one).
Also, if this function takes up half of the program's running time then it seems likely that the program could be restructured to use a different algorithm which calls this function less often, but this would probably require much more effort than the other suggestions.
Context
StackExchange Code Review Q#132809, answer score: 3
Revisions (0)
No revisions yet.