patternpythonMinor
Confidence score calculation
Viewed 0 times
calculationconfidencescore
Problem
I was looking for a way to calculate a score 'x' for each pair of elements in two separate arrays. The goal of the code is to return an output array containing the score for each entry in the input arrays.
from math import sqrt
import numpy as np
downs = np.genfromtxt('path\file.csv', dtype=long, delimiter=',', skiprows=1, usecols=(1,))
ups = np.genfromtxt('path\file.csv', dtype=long, delimiter=',', skiprows=1, usecols=(2,))
n = np.add(ups,downs)
def _confidence(n):
for i, j in zip[n, ups]:
z = 1.0 #1.0 = 85%, 1.6 = 95%
if n == 0:
return 0
phat = float(ups) / n
x = ((phat + z*z/(2*n) - z * sqrt((phat*(1-phat)+z*z/(4*n))/n))/(1+z*z/n))
print [x]Solution
It looks like you are trying to calculate the Wilson Score lower confidence bound for ranking as described here: http://www.evanmiller.org/how-not-to-sort-by-average-rating.html
There are a couple issues with the code. The main one that jumps out is that only some of the needed variables (n) are passed into the function and the other simply read from the module namespace (ups, downs). Encapsulating into functions can help. Another issue is where i and j are not actually used within the loop and are bad variable names anyway.
The following code also generates synthetic data so I could test the full code.
There are a couple issues with the code. The main one that jumps out is that only some of the needed variables (n) are passed into the function and the other simply read from the module namespace (ups, downs). Encapsulating into functions can help. Another issue is where i and j are not actually used within the loop and are bad variable names anyway.
The following code also generates synthetic data so I could test the full code.
from math import sqrt
import numpy as np
def generate_data(filename):
""" Generate synthetic data for StackExchange example """
ratings_per_example = 10
ratings = np.random.binomial(ratings_per_example,0.4,100)
fake_ups = ratings
fake_downs = ratings_per_example-ratings
fake_ids = np.arange(len(ratings))
merged = np.asarray(zip(fake_ids, fake_downs, fake_ups))
np.savetxt(filename, merged, fmt='%d', delimiter=",", header='Fake Header')
def confidence(filename):
""" Returns an array of lower confidence bounds from up/down rankings in file """
downs = np.genfromtxt(filename, dtype=long, delimiter=',', skip_header=1, usecols=(1,))
ups = np.genfromtxt(filename, dtype=long, delimiter=',', skip_header=1, usecols=(2,))
lower_bound_ranks = [wilson_lower_bound(up, down) for up, down in zip(ups, downs)]
return lower_bound_ranks
def wilson_lower_bound(up, down, z=1.0):
""" http://www.evanmiller.org/how-not-to-sort-by-average-rating.html """
n = up + down
if n == 0:
return 0.0
else:
phat = float(up) / n
return ((phat + z*z/(2*n) - z * sqrt((phat*(1-phat)+z*z/(4*n))/n))/(1+z*z/n))
generate_data('foo.csv')
confidence('foo.csv')Code Snippets
from math import sqrt
import numpy as np
def generate_data(filename):
""" Generate synthetic data for StackExchange example """
ratings_per_example = 10
ratings = np.random.binomial(ratings_per_example,0.4,100)
fake_ups = ratings
fake_downs = ratings_per_example-ratings
fake_ids = np.arange(len(ratings))
merged = np.asarray(zip(fake_ids, fake_downs, fake_ups))
np.savetxt(filename, merged, fmt='%d', delimiter=",", header='Fake Header')
def confidence(filename):
""" Returns an array of lower confidence bounds from up/down rankings in file """
downs = np.genfromtxt(filename, dtype=long, delimiter=',', skip_header=1, usecols=(1,))
ups = np.genfromtxt(filename, dtype=long, delimiter=',', skip_header=1, usecols=(2,))
lower_bound_ranks = [wilson_lower_bound(up, down) for up, down in zip(ups, downs)]
return lower_bound_ranks
def wilson_lower_bound(up, down, z=1.0):
""" http://www.evanmiller.org/how-not-to-sort-by-average-rating.html """
n = up + down
if n == 0:
return 0.0
else:
phat = float(up) / n
return ((phat + z*z/(2*n) - z * sqrt((phat*(1-phat)+z*z/(4*n))/n))/(1+z*z/n))
generate_data('foo.csv')
confidence('foo.csv')Context
StackExchange Code Review Q#73479, answer score: 3
Revisions (0)
No revisions yet.