HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Bernoulli trials using a condition in a vectorized operation

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
vectorizedconditionoperationbernoulliusingtrials

Problem

I was wondering how to vectorize the following code instead of using the for-loop and still be able to check for a condition.

# n is a numpy array which keeps the 'number of trials' parameter for the binomial distribution
# p is a numpy array which keeps the 'probability of success' parameter for the binomial distribution
# baf is a numpy array that will keep the results from the np.random.binomial() function

baf = np.repeat(np.nan, n.size)
for index, value in enumerate(n):
    if value >= 30:
        baf[index] = np.random.binomial(value, p[index]) / value


My own vectorized solution is:

baf = np.repeat(np.nan, n.size)
indices = np.where(n >= 30)[0] 
baf[indices] = np.random.binomial(n[indices], p[indices]).astype(float) / n[indices]


However, I was wondering whether there are other more efficient solutions?

Solution

To initialize all entries of an array to the same value, np.repeat is probably not the fastest option, and definitely not what most people would expect to find. The more proper way would probably be:

baf = np.full((n.size,), fill_value=np.nan)


And for cases like this it is typically faster to use a boolean mask for indexing, skipping the call to np.where to extract the numerical indices. This should do the same faster:

mask = n >= 30
baf[mask] = np.random.binomial(n[mask], p[mask]).astype(float) / n[mask]

Code Snippets

baf = np.full((n.size,), fill_value=np.nan)
mask = n >= 30
baf[mask] = np.random.binomial(n[mask], p[mask]).astype(float) / n[mask]

Context

StackExchange Code Review Q#63869, answer score: 3

Revisions (0)

No revisions yet.