HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Kolmogorov-Smirnov function

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
functionkolmogorovsmirnov

Problem

I'm trying to compare two lists of floating point data and see if they might have come from different underlying distributions. The two samples are a before and after, so by comparing the two I thought I could detect whether any change occurred between the two timeframes.

To do this I'm using the two-sample Kolmogorov-Smirnov test. I have the following function which calculates the core statistic used in the test:

def kolmogorov_smirnov(data1, data2):
"""
Given two lists of data, finds the two-sample Kolmogorov–Smirnov statistic
"""
data1 = sorted(data1)
data2 = sorted(data2)

index1 = 0
index2 = 0

ks_stat = 0

while index1 data2[index2]:
index2 += 1

ks_stat = max(ks_stat, abs(index1/len(data1) - index2/len(data2)))

return ks_stat


I realise that I can also shorten the while loop like so:

while index1 = data2[index2]:
index2 += 1

ks_stat = max(ks_stat, abs(index1/len(data1) - index2/len(data2)))


Which version should I use? Also, is there anything worth pointing out about the main code?

Solution

-
The shorter version seemed better to me at first sight (because it is simpler), but actually it is incorrect. Both if statements intend to compare the same two values, but incrementing index1 changes data1[index1] for the second statement. You could fix this by assigning the values to variables:

while index1 = value2:
        index2 += 1


-
Updating ks_stat one value at a time feels a bit awkward to me. You could collect all the absolute differences in a list and take max() of it in the end. Or, extract the loop into a generator to avoid the list.

Code Snippets

while index1 < len(data1) and index2 < len(data2):
    value1, value2 = data1[index1], data2[index2]
    if value1 <= value2:
        index1 += 1
    if value1 >= value2:
        index2 += 1

Context

StackExchange Code Review Q#79527, answer score: 4

Revisions (0)

No revisions yet.