HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Dynamic programming based range minimum query

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
rangeprogrammingqueryminimumbaseddynamic

Problem

I am implementing a DP-based range minimum query data structure and algorithm in Python 2.7. Any advice on algorithm time complexity improvements, code bugs or code style issues are appreciated.

More information about range minimum query.

from collections import defaultdict
import random
def build_rmq(numbers):
    rmq_map = defaultdict(int) # key: tuple of (start, end) index, value: min value
    j = 0
    while 2 ** j - 1  end:
        start, end = end, start
    print start, end, numbers[start:end+1]
    print rmq_query(start, end, rmq_map)

Solution


  1. Review



-
There are no docstrings. What do these functions do? What do they return? How do I use them?

-
The function build_rmq constructs a persistent data structure which can then be queried by calling the function rmq_query. When you have a persistent data structure and functions that operate on it, this is good candidate for making into a class.

-
The comment on this line is not quite right:

rmq_map = defaultdict(int) # key: tuple of (start, end) index, value: min value


the second element of the key is a step, not an end index.

-
There is no need to use a defaultdict here, since the code always fills in the entry before looking it up. An ordinary dictionary would do fine.

-
This code:

j = 0
while 2 ** j - 1 < len(numbers):
    j += 1


computes the smallest power of 2 that is greater than len(numbers). It would be simpler to use int.bit_length:

j = len(numbers).bit_length()


(In ChatterOne's answer it is suggested that you use int(log(len(numbers), 2)). I think this is a dangerous suggestion, because log makes approximations that can lead to errors. For example,

int(log(2**48 - 1, 2))


evaluates to 48, but the correct result is 47. In order to check that these errors can't actually cause your code to fail, you need to do numerical analysis. This seems complicated and error-prone to me, compared to sticking with integer arithmetic where you know the results are exact.)

-
Similarly, this loop:

j = 0
while start + 2 ** j - 1 <= end:
    j += 1


becomes:

j = (end - start + 1).bit_length()


-
This code subtracts 1 from j and then adds it back again, which is not necessary:

j -= 1
for step in range(0,j+1):


-
range(0, x) can be abbreviated to range(x).

-
rmq_map[(i,step)] does not need the parentheses: rmq_map[i, step] is the same.

-
The two loops over step and i can be combined into a single loop using itertools.product

-
I would avoid the special case if step == 0 by having a separate loop to handle that case:

rmq_map = {(i, 0): m for i, m in enumerate(numbers)}


and then looping over step starting from 1.

-
In Python it is usual to represent a range in half-open fashion (exclusive at the upper end). It would be convenient if rmq_query took its arguments in the same way.

-
The top-level code runs a single range query but does not check that the result is correct. It would be easy to check the result of rmq_query(start, end) against min(numbers[start:end+1]). It would then be easy to turn this into a unit test using the features in the unittest module.

  1. Revised code



from itertools import product

class RangeMinimum(object):
    "Data structure providing efficient range-minimum queries."

    def __init__(self, numbers):
        "Build a RangeMinimum object for the given sequence of numbers."
        # Mapping from (start, step) to min(numbers[start:start + 2**step])
        self._rmq = rmq = {(i, 0): m for i, m in enumerate(numbers)}
        n = len(numbers)
        for step, i in product(range(1, n.bit_length()), range(n)):
            j = i + 2 ** (step-1)
            if j < n:
                rmq[i, step] = min(rmq[i, step-1], rmq[j, step-1])
            else:
                rmq[i, step] = rmq[i, step-1]

    def query(self, start, stop):
        "Return min(numbers[start:stop])."
        j = (stop - start).bit_length() - 1
        x = self._rmq[start, j]
        y = self._rmq[stop - 2 ** j, j]
        return min(x, y)

import unittest
import random

class TestRangeMinimum(unittest.TestCase):
    def test_range_minimum(self):
        n = 1000
        numbers = [random.randrange(n) for _ in range(n)]
        rmq = RangeMinimum(numbers)
        for _ in range(n):
            start, stop = sorted(random.sample(range(n + 1), 2))
            self.assertEqual(rmq.query(start, stop), min(numbers[start:stop]))


  1. Generalization



There's nothing special about the minimum here — exactly the same algorithm can be used for range-maximum queries, or for querying other kinds of statistic on a range (such as the longest common prefix, or the union of sets).

So that suggests this kind of generalization:

```
from itertools import product

class RangeQuery(object):
"Data structure providing efficient range queries."

def __init__(self, items, fn):
"""Build a RangeQuery object for a sequence of items.

fn -- function taking two items and returning their query
result, for example "min" to query the range-minimum. It must be
associative, commutative, and idempotent.

"""
# Mapping from (start, step) to reduce(fn, items[start:start + 2**step])
self._rq = rq = {(i, 0): item for i, item in enumerate(items)}
self._fn = fn
n = len(items)
for step, i in product(range(1, n.bit_length()), range(n)):
j = i + 2 ** (step-1)
if j < n:
rq[i, step] = fn(rq[i, s

Code Snippets

rmq_map = defaultdict(int) # key: tuple of (start, end) index, value: min value
j = 0
while 2 ** j - 1 < len(numbers):
    j += 1
j = len(numbers).bit_length()
int(log(2**48 - 1, 2))
j = 0
while start + 2 ** j - 1 <= end:
    j += 1

Context

StackExchange Code Review Q#154047, answer score: 8

Revisions (0)

No revisions yet.