HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Binary search: first/last/random occurrence

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
randomlastsearchfirstbinaryoccurrence

Problem

I've written some code to "binary search" a list, and return the first occurrence of the target:

def bsearch(a, left, right, target, first_or_last = 'first'):
    """
    Use binary search to find the first, last, or random index
    of target.

    >>> a = [1,1,1,1,1,1]
    >>> bsearch(a, 0, len(a)-1, 1, 'first')
    0
    """

    if left > right:
        return -1

    mid = (left + right)//2
    val = a[mid]

    if val == target:
        first_occur = mid
        if first_or_last == 'first'
            later_occur = bsearch(a, left, mid - 1, target)  # check if there's an earlier occurrence in the left sub-list
            if later_occur != -1:
                return later_occur
            else:
                return first_occur

        elif first_or_last == 'last':
            later_occur = bsearch(a, mid + 1, right, target)  
            if later_occur != -1:
                return later_occur
            else:
                return first_occur
        else:
            return first_occur

    if target  val:
        return bsearch(a, mid + 1, right, target)


Is there a more elegant/cleaner way to give this function the ability to find the first, last, or random occurrence of the target, without use of a string to represent the cases (as I've done)? Suggestions would be great.

Solution

You want to have the ability to choose between finding the first, last and a random occurrence of an element in a list. Since you have three options, you cannot use true/false, and numbers aren't very intuitive (and they usually end up magic numbers). The remaining options are strings, functions or some other data structure. Since you don't like strings, and other data structures wouldn't make much sense (why use a complex object when a simple one will suffice?), then let's stick to functions.

But wait, you don't want any code duplication. That's perfectly ok. We note that all 3 of these options involve variations of calling the basic binary search.

Let's create a basic binary search that outputs a few extras:

def _binary_search(array, element, low, high):
    """Binary search that returns (index, low, high) of the final search step"""
    while low  element:
            high = mid - 1

    return (-1, 0, 0)


Then, let's create the new functions:

def lower_bound(array, element):
    """Returns the index of the first occurrence of element in array"""
    index = -1
    first_index, low, high = _binary_search(array, element, 0, len(array)-1)
    index = first_index
    while first_index != -1:
        index = first_index
        first_index, low, high = _binary_search(array, element, low, first_index-1)
    return index

def upper_bound(array, element):
     """Returns the index of the last occurence of element in array"""
     index = -1
     first_index, low, high = _binary_search(array, element, 0, len(array)-1)
     index = first_index
     while first_index != -1:
         index = first_index
         first_index, low, high = _binary_search(array, element, first_index+1, high)
     return index

def binary_search(array, element):
    """Basic binary search"""
    lower_bound = 0
    upper_bound = len(array) - 1
    return _binary_search(array, element, lower_bound, upper_bound)[0]


This has the advantage of being readable, not susceptible to spelling errors ("First" vs "first" vs "fist") and not having much in the way of duplicated code.

As a fun fact: this entire problem can be easily solved by making wrappers around bisect_left and bisect_right from the bisect module like this very convenient SortedCollection Python recipe has done here.

Update:

  • Renamed functions to comply with standards as per comments



  • Changed algorithms to not create new arrays and instead do an in-place binary search



  • Changed algorithms to produce the correct results



  • Added reference to an easy to use Python recipe that implements this, and other features.

Code Snippets

def _binary_search(array, element, low, high):
    """Binary search that returns (index, low, high) of the final search step"""
    while low <= high:
        mid = (low + high) // 2
        current_element = array[mid]
        if current_element == element:
            return (mid, low, high)
        elif current_element < element:
            low = mid + 1
        elif current_element > element:
            high = mid - 1

    return (-1, 0, 0)
def lower_bound(array, element):
    """Returns the index of the first occurrence of element in array"""
    index = -1
    first_index, low, high = _binary_search(array, element, 0, len(array)-1)
    index = first_index
    while first_index != -1:
        index = first_index
        first_index, low, high = _binary_search(array, element, low, first_index-1)
    return index


def upper_bound(array, element):
     """Returns the index of the last occurence of element in array"""
     index = -1
     first_index, low, high = _binary_search(array, element, 0, len(array)-1)
     index = first_index
     while first_index != -1:
         index = first_index
         first_index, low, high = _binary_search(array, element, first_index+1, high)
     return index


def binary_search(array, element):
    """Basic binary search"""
    lower_bound = 0
    upper_bound = len(array) - 1
    return _binary_search(array, element, lower_bound, upper_bound)[0]

Context

StackExchange Code Review Q#59855, answer score: 4

Revisions (0)

No revisions yet.