HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonCritical

Dynamic programming knapsack solution

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
knapsackprogrammingsolutiondynamic

Problem

I wrote a solution to the Knapsack problem in Python, using a bottom-up dynamic programming algorithm. It correctly computes the optimal value, given a list of items with values and weights, and a maximum allowed weight.

Any critique on code style, comment style, readability, and best-practice would be greatly appreciated. I'm not sure how Pythonic the code is; filling in a matrix seems like a natural way of implementing dynamic programming, but it doesn't "feel" Pythonic (and is that a bad thing, in this case?).

Note that the comments are a little on the verbose side; as I was writing the algorithm, I tried to be as explicit about each step as possible, since I was trying to understand it to the fullest extent possible. If they are excessive (or just plain incorrect), feel free to comment (hah!) on it.

```
import sys

def knapsack(items, maxweight):
# Create an (N+1) by (W+1) 2-d list to contain the running values
# which are to be filled by the dynamic programming routine.
#
# There are N+1 rows because we need to account for the possibility
# of choosing from 0 up to and including N possible items.
# There are W+1 columns because we need to account for possible
# "running capacities" from 0 up to and including the maximum weight W.
bestvalues = [[0] * (maxweight + 1)
for i in xrange(len(items) + 1)]

# Enumerate through the items and fill in the best-value table
for i, (value, weight) in enumerate(items):
# Increment i, because the first row (0) is the case where no items
# are chosen, and is already initialized as 0, so we're skipping it
i += 1
for capacity in xrange(maxweight + 1):
# Handle the case where the weight of the current item is greater
# than the "running capacity" - we can't add it to the knapsack
if weight > capacity:
bestvalues[i][capacity] = bestvalues[i - 1][capacity]
else:
#

Solution


  1. Review



-
The function knapsack lacks a docstring that would explain what arguments the function takes (what kind of things are in items? must items be a sequence, or can it be an iterable?) and what it returns.

-
This kind of function is ideal for doctests.

-
The comments say things like "Create an (N+1) by (W+1) 2-d list". But what is N and what is W? Presumably N is len(items) and W is maxweight, so it would be a good idea to use the same names in the comments and the code:

N = len(items)
W = maxweight


-
The comment above bestvalues fails to explain what the values in this table mean. I would write something like this:

# bestvalues[i][j] is the best sum of values for any
# subsequence of the first i items, whose weights sum
# to no more than j.


This makes it obvious why \$0 ≤ i ≤ N\$ and why \$0 ≤ j ≤ W\$.

-
In a loop like:

bestvalues = [[0] * (maxweight + 1)
              for i in xrange(len(items) + 1)]


where the loop variable (here i) is unused, it's conventional to name it _.

-
These lines could be omitted:

# Increment i, because the first row (0) is the case where no items
# are chosen, and is already initialized as 0, so we're skipping it
i += 1


and then in the rest of the code, use i + 1 instead of i and i instead of i - 1.

-
The reconstruction loop:

i = N
while i > 0:
    # code
    i -= 1


can be written like this:

for i in reversed(range(1, N + 1)):
    # code


-
The code prints an error message like this:

print('usage: knapsack.py [file]')


Error messages ought to go to standard error (not standard output). And it's a good idea not to hard-code the name of the program, because it would be easy to rename the program but forget to update the code. So write instead:

sys.stderr.write("usage: {0} [file]\n".format(sys.argv[0]))


-
The block of code that reads the problem description and prints the result only runs when __name__ == '__main__'. This makes it hard to test, for example from the interactive interpreter. It's usually best to put this kind of code in its own function, like this:

def main(filename):
    with open(filename) as f:
        # etc.

if __name__ == '__main__':
    if len(sys.argv) != 2:
        print('usage: knapsack.py [file]')
        sys.exit(1)
    main(sys.argv[1])


and now you can run main('problem.txt') from the interpreter to test it.

-
The code reads the whole of the file into memory as a list of lines:

lines = f.readlines()


this is harmless here because the file is small, but it's usually better to process a file one line at a time, like this:

with open(filename) as f:
    maxweight = int(next(f))
    items = [[int(word) for word in line.split()] for line in f]


  1. Recursive approach



Any dynamic programming algorithm can be implemented in two ways: by building a table of partial results from the bottom up (as in the code in the post), or by recursively computing the result from the top down, using memoization to avoid computing any partial result more than once.

The top-down approach often results in slightly simpler and clearer code, and it only computes the partial results that are needed for the particular problem instance (whereas in the bottom-up approach it's often hard to avoid computing all partial results even if some of them go unused).

So we could use @functools.lru_cache to implement a top-down solution, like this:

from functools import lru_cache

def knapsack(items, maxweight):
    """Solve the knapsack problem by finding the most valuable subsequence
    of items that weighs no more than maxweight.

    items must be a sequence of pairs (value, weight), where value is a
    number and weight is a non-negative integer.

    maxweight is a non-negative integer.

    Return a pair whose first element is the sum of values in the most
    valuable subsequence, and whose second element is the subsequence.

    >>> items = [(4, 12), (2, 1), (6, 4), (1, 1), (2, 2)]
    >>> knapsack(items, 15)
    (11, [(2, 1), (6, 4), (1, 1), (2, 2)])

    """
    @lru_cache(maxsize=None)
    def bestvalue(i, j):
        # Return the value of the most valuable subsequence of the first
        # i elements in items whose weights sum to no more than j.
        if j < 0:
            return float('-inf')
        if i == 0:
            return 0
        value, weight = items[i - 1]
        return max(bestvalue(i - 1, j), bestvalue(i - 1, j - weight) + value)

    j = maxweight
    result = []
    for i in reversed(range(len(items))):
        if bestvalue(i + 1, j) != bestvalue(i, j):
            result.append(items[i])
            j -= items[i][1]
    result.reverse()
    return bestvalue(len(items), maxweight), result


To see how many partial solutions this needs to compute, print bestvalue.cache_info() just before returning the result. When solving the example problem in the docstring, this outputs:

```
CacheInfo(hits=1

Code Snippets

N = len(items)
W = maxweight
# bestvalues[i][j] is the best sum of values for any
# subsequence of the first i items, whose weights sum
# to no more than j.
bestvalues = [[0] * (maxweight + 1)
              for i in xrange(len(items) + 1)]
# Increment i, because the first row (0) is the case where no items
# are chosen, and is already initialized as 0, so we're skipping it
i += 1
i = N
while i > 0:
    # code
    i -= 1

Context

StackExchange Code Review Q#20569, answer score: 62

Revisions (0)

No revisions yet.