patternpythonMinor
Quick select the kth smallest value
Viewed 0 times
smallestthevaluequickkthselect
Problem
Here is my code for quick select in Python 2.7. I am trying to optimize code for fewer comparisons. And my specific question is marked in code, 3 lines marked with
Any performance improvement in terms of algorithm time complexity, code bugs or code style are appreciated.
Source code in Python 2.7,
more safe way. Wondering if my code is safe enough to handle all corner cases (in more details, I think more safe way definitely can handle all corner cases, and wondering if my current code which has less comparison is also good to handle all corner cases)? I have tested (see main function) and also tested using median as an example.Any performance improvement in terms of algorithm time complexity, code bugs or code style are appreciated.
Source code in Python 2.7,
import random
def quick_select(numbers, start, end, k):
if start == end:
if k == 0:
return numbers[start]
else:
raise Exception('invalid input!')
pivot = numbers[end]
i = start
j = end - 1
while i pivot: # more safe way: while i pivot
j -= 1
if i k:
return quick_select(numbers, start, i-1, k)
else:
return quick_select(numbers, i+1, end, k-i+start-1)
def median(numbers):
if len(numbers) % 2 == 1:
quick_select(numbers, 0, len(numbers)-1, len(numbers) / 2)
else:
x = quick_select(numbers, 0, len(numbers)-1, len(numbers)/2 - 1)
y = quick_select(numbers, 0, len(numbers) - 1, len(numbers) / 2)
return (x+y)/2.0
if __name__ == "__main__":
numbers = []
for i in range(10):
numbers.append(random.randint(0,10))
k = random.randint(0, 9)
numbers = [7, 5, 2, 3, 7, 2, 8, 7, 4, 9]
k = 2
print 'original ', numbers
print 'k ', k
print quick_select(numbers, 0, len(numbers)-1, k)
print 'median', median(numbers)
print 'sorted ', sorted(numbers)Solution
Handle the empty list correctly
Let’s suppose I try to do a quick select on the empty list: your code throws an IndexError.
It would be better if it returned a useful error message in this case: for example, raising a
Better bounds checking
The first time I used your code, I had an off-by-one error in the
Similar calls where you use
Or even better – don’t let me handle them at all. They’re required parameters for the recursive step, but if I’m passing you a list, I probably want to run quick select over the entire list. For that case, make those parameters optional, and pick sensible defaults:
(If those still aren’t the correct defaults, you definitely need a better docstring and error checking.)
Bug: elements are not selected according to their sort order
It’s possible that I’ve misunderstood the algorithm – in which case, please correct me in the comments. Consider the following example:
My intuition is that the quickselect result should match the values in the original list. Your code doesn’t seem to be handling that correctly.
I found this bug using Hypothesis to test your code:
It looks for the smallest element in the list using your algorithm, then compares that to the result if it sorts and takes the first element with
Hypothesis could do some very powerful testing of this sort of function. (Disclaimer: I’m one of the project owners on GitHub.)
Bug: results are dependent on the initial sort order
Your function shuffles the elements of the list. In theory, this shouldn’t affect the result, right? The smallest element of a list is invariant under shuffling. In practice, it’s a bit different. Compare the following examples:
I didn’t find this bug with Hypothesis – I was just playing with the example in the previous bug, and stumbled upon it by accident.
Absence of comments
You’ll notice I haven’t actually reviewed any of the code you’ve written. I’m not familiar with the quickselect algorithm, and the Wikipedia page is more complicated than I care to read on a Saturday evening.
There aren’t any comments in your code – this makes it quite difficult to understand. It would be good if there were some comments to explain the purpose of the algorithm – why do we choose this pivot? Why does the algorithm work this way? Why do we know this is correct? etc.
Not only does that make the code much easier to read, review and maintain, it would also make it more likely that you’d spot where the bugs are, because you can see where it deviates from the correct quickselect algorithm.
Let’s suppose I try to do a quick select on the empty list: your code throws an IndexError.
>>> from quickselect import quick_select
>>> quick_select([], 0, 0, 0)
Traceback (most recent call last):
File "", line 1, in
File "quickselect.py", line 10, in quick_select
return numbers[start]
IndexError: list index out of range
It would be better if it returned a useful error message in this case: for example, raising a
ValueError if you detect the passed in list is empty:if not numbers:
raise ValueError('Cannot perform quickselect on an empty list')Better bounds checking
The first time I used your code, I had an off-by-one error in the
end parameter. I made the following call:>>> x = [1, 2, 3]
>>> quick_select(x, 0, len(x), 0)
Traceback (most recent call last):
File "", line 1, in
File "quickselect.py", line 15, in quick_select
j = end - 1
IndexError: list index out of range
Similar calls where you use
len(numbers) instead of len(numbers) - 1 will cause different varieties of IndexError. It would be better if this code had a docstring and/or error messages to push me in the right direction – tell me if my start or end variables don’t make sense for the list that I’ve passed in.Or even better – don’t let me handle them at all. They’re required parameters for the recursive step, but if I’m passing you a list, I probably want to run quick select over the entire list. For that case, make those parameters optional, and pick sensible defaults:
def quick_select(numbers, k, start=None, end=None):
if start is None:
start = 0
if end is None:
end = len(numbers) - 1(If those still aren’t the correct defaults, you definitely need a better docstring and error checking.)
Bug: elements are not selected according to their sort order
It’s possible that I’ve misunderstood the algorithm – in which case, please correct me in the comments. Consider the following example:
>>> from quickselect import quick_select
>>> for k in [0, 1, 2, 3]:
... print(k, quick_select([0, 1, 2, 3], 0, 3, k))
...
0 1
1 0
2 3
3 2
My intuition is that the quickselect result should match the values in the original list. Your code doesn’t seem to be handling that correctly.
I found this bug using Hypothesis to test your code:
from hypothesis import given
from hypothesis.strategies import integers, lists
@given(lists(integers(), min_size=1))
def test_smallest_is_smallest(xs):
x = quick_select(xs, 0)
assert x == sorted(xs)[0]It looks for the smallest element in the list using your algorithm, then compares that to the result if it sorts and takes the first element with
sorted(). If they don’t match, it raises an error. That produced the example above.Hypothesis could do some very powerful testing of this sort of function. (Disclaimer: I’m one of the project owners on GitHub.)
Bug: results are dependent on the initial sort order
Your function shuffles the elements of the list. In theory, this shouldn’t affect the result, right? The smallest element of a list is invariant under shuffling. In practice, it’s a bit different. Compare the following examples:
>>> quick_select([1, 2, 3], 0, 2, 0)
1
>>> quick_select([1, 2, 3], 0, 2, 1)
3
>>> x = [1, 2, 3]
>>> quick_select(x, 0, 2, 0)
1
>>> quick_select(x, 0, 2, 1)
2
I didn’t find this bug with Hypothesis – I was just playing with the example in the previous bug, and stumbled upon it by accident.
Absence of comments
You’ll notice I haven’t actually reviewed any of the code you’ve written. I’m not familiar with the quickselect algorithm, and the Wikipedia page is more complicated than I care to read on a Saturday evening.
There aren’t any comments in your code – this makes it quite difficult to understand. It would be good if there were some comments to explain the purpose of the algorithm – why do we choose this pivot? Why does the algorithm work this way? Why do we know this is correct? etc.
Not only does that make the code much easier to read, review and maintain, it would also make it more likely that you’d spot where the bugs are, because you can see where it deviates from the correct quickselect algorithm.
Code Snippets
if not numbers:
raise ValueError('Cannot perform quickselect on an empty list')def quick_select(numbers, k, start=None, end=None):
if start is None:
start = 0
if end is None:
end = len(numbers) - 1from hypothesis import given
from hypothesis.strategies import integers, lists
@given(lists(integers(), min_size=1))
def test_smallest_is_smallest(xs):
x = quick_select(xs, 0)
assert x == sorted(xs)[0]Context
StackExchange Code Review Q#154325, answer score: 5
Revisions (0)
No revisions yet.