HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Probabilistically pick a candidate

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
pickprobabilisticallycandidate

Problem

I have a list of candidates which looks like this:

candidates = [
    {
        'name': 'John',
        'rank': 13
    },
    {
        'name': 'Joe',
        'rank': 8
    },
    {
        'name': 'Averell',
        'rank': 5
    },
    {
        'name': 'William',
        'rank': 2
    }
]


What I want to do is semi-randomly pick one of these candidates, based on its squared rank. So that a candidate A having a rank twice as big as B, will have 4 times more chances to be picked than B.

Here is a naive implementation of the idea I had to solve this problem:

def pick_candidate(candidates):
    # initiates a global probability space
    prob_space = []

    # bring the max rank to 20
    # to avoid useless CPU burning
    # while keeping a good granularity -
    # basically, just bring the top rank to 20
    # and the rest will be divided proportionally
    rate = candidates[0]['rank'] / 20

    # fills the probability space with the indexes
    # of 'candidates', so that prob_space looks like:
    # [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1]
    # if candidates[0] has rank 3 and candidates[1] has rank 2
    for i, c in enumerate(candidates):
        rank = c['rank'] / rate
        for j in range(int(rank*rank)):
            prob_space.append(i)

    # picks a random index from the probability space
    picked_prob_space_index = random.randint(0, len(prob_space)-1)

    # retrieves the matching candidate
    picked_candidate_index = prob_space[picked_prob_space_index]

    return candidates[picked_candidate_index]


The questions I'm thinking of, concerning the above code, are:

  • Concept: Is the core principle of the algorithm (the idea of building prob_space, etc) overkill and solvable more easily in other ways or with builtins



  • Implementation: This core principle put aside, what do you think of the implementation? Would you think of a better, cleaner way to write it?

Solution

Concept

  • imho you don't need build prob_space or normalize by max rank == 20



-
you only need

John: 1 ... 13^2
Joe:  13^2+1 ... 13^2+8^2
Averell: 13^2+8^2+1 ... 13^2+8^2+5^2
William: 13^2+8^2+5^2 ... 13^2+8^2+5^2+2^2
and choice a man having ranint(1, 13^2+8^2+5^2+2^2)


  • transform list [132, 82, 52, ...] to [132, 132+82, 132+82+5**2, ...] we can by accumulate



  • searching in this sorted list [132, 132+82, 132+82+52, ...] we can by binary search algorithm implemented in bisect module



-
the algorithm:

from collections import namedtuple
from itertools import accumulate
from random import randint
from bisect import bisect_left

Candidate = namedtuple('Candidate', 'name rank')

def pick_random_candidate(candidates):
    assert (len(candidates) > 0) # note: probably we want return None with empty list?
    squared_ranks_accumulated = list(accumulate(c.rank ** 2 for c in candidates))
    random_pick = randint(1, squared_ranks_accumulated[-1])
    return candidates[bisect_left(squared_ranks_accumulated, random_pick)]

if __name__ == "__main__":
    candidates = [Candidate(*t) for t in [('John', 13), ('Joe', 8), ('Averell', 5), ('William', 2)]]
    random_candidate = pick_random_candidate(candidates)
    print(random_candidate)


Implementation

  • better use from random import randint instead of import random



  • if you have randint(0, x - 1) you can replace it by randrange(x)



  • constants like 20 should be defined as variable, e.g. top_rank = 20 or even def pick_candidate(candidates, top_rank=20). Reason: if someone change 20 to 21 in next three months there are high probability that don't change comment (so better when comment explains top_rank not 20)



  • you can often define structure Candidate as namedtuple (dict is better only sometimes if have some dynamic fields to store)



  • what's with the empty list of candidates (exception KeyError is expected or something else should be?)

Code Snippets

John: 1 ... 13^2
Joe:  13^2+1 ... 13^2+8^2
Averell: 13^2+8^2+1 ... 13^2+8^2+5^2
William: 13^2+8^2+5^2 ... 13^2+8^2+5^2+2^2
and choice a man having ranint(1, 13^2+8^2+5^2+2^2)
from collections import namedtuple
from itertools import accumulate
from random import randint
from bisect import bisect_left

Candidate = namedtuple('Candidate', 'name rank')


def pick_random_candidate(candidates):
    assert (len(candidates) > 0) # note: probably we want return None with empty list?
    squared_ranks_accumulated = list(accumulate(c.rank ** 2 for c in candidates))
    random_pick = randint(1, squared_ranks_accumulated[-1])
    return candidates[bisect_left(squared_ranks_accumulated, random_pick)]


if __name__ == "__main__":
    candidates = [Candidate(*t) for t in [('John', 13), ('Joe', 8), ('Averell', 5), ('William', 2)]]
    random_candidate = pick_random_candidate(candidates)
    print(random_candidate)

Context

StackExchange Code Review Q#127805, answer score: 4

Revisions (0)

No revisions yet.