HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Bag data structure (a set that can store multiple copies of items)

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
cancopiesstorebagitemsthatstructuremultipledataset

Problem

I am trying to make my class a bit cleaner, especially some of its methods like Count, __add__, remove and __iter__. I feel like I could use cleaner code on the above methods and get the same results.

Bag class represents and defines methods, operators, and an iterator for the Bag class. Bags are similar to sets, and have similar operations but unlike sets they can store multiple copies of items. Store the information in bags as dictionaries whose elements are keys whose associated values are the number of times the key occurs in the bag.

```
from collections import defaultdict
from goody import type_as_str
import copy
from collections import Counter

class Bag:

def __init__(self, items = []):
self.bag_value = defaultdict(int)
for item in items:
self.bag_value[item] += 1

def __repr__(self):
bag_list = []
for item, count in self.bag_value.items():
bag_list.extend(list(item*count))
return 'Bag(' + str(bag_list) + ')'

def __str__(self):
return 'Bag(' + ','.join(str(item) + '[' + str(count) + ']' for item, count in self.bag_value.items()) + ')'

def __len__(self):
bag_len = 0
for value in self.bag_value:
bag_len += self.bag_value[value]
return bag_len

def unique(self):
return len(self.bag_value)

def __contains__(self, item):
return item in self.bag_value

def count(self, item):
if item in self.bag_value:
return self.bag_value[item]
else:
return 0

def add(self, new):
self.bag_value[new] += 1

def __add__(self,right):
mergedbag = Bag()
mergedbag.bag_value = copy.copy(self.bag_value)
for item in right.bag_value.keys():
mergedbag.bag_value[item] += right.bag_value[item]
return mergedbag

def remove(self, item):
if item in self.bag_value:

Solution

As you're trying to implement a multiset (or bag) I would recommend you to use collections.Counter which already does the same thing:


The Counter class is similar to bags or multisets in other languages.

So, simply inherit from Counter class and add your additional methods in the class or override the already present methods from Counter class as per your requirements. Counter class also supports &, |, - operators to perform various multiset operations.

Now, coming back to your code:

def __init__(self, items = []):
    self.bag_value = defaultdict(int)
    for item in items:
        self.bag_value[item] += 1


Never use a mutable object as default value of an argument as there value can persist between calls, the recommended way is to use None as default value and then check it inside of the __init__ method. As we are now inheriting from Counter we can directly operate on self instead of using a defaultdict to keep the count.

class Bag(Counter):

     def __init__(self, _items=None, **kwargs):

          _items = [] if _items is None else _items
          super().__init__(_items, **kwargs)



Why both _items and kwargs?

It depends on how you want to use it ultimately, but with this signature you can use it as:

Bag('aaabbccd')
Bag(a=3, b=5, c=7)
Bag('aabbcc', a=5, c=7, r=9)


I used _items to handle the case when user actually passed a key named items:

Bag(a=5, c=7, r=9, items=9)


def __repr__(self):
    bag_list = []
    for item, count in self.bag_value.items():
         bag_list.extend(list(item*count))
    return 'Bag(' + str(bag_list) + ')'


Counter class already has a method to handle this called elements() which returns an itertools.chain object which is an iterator, we can call list on it and we are done:

def __repr__(self):
    return 'Bag({})'.format(list(self.elements()))


def __str__(self):
    return 'Bag(' + ','.join(str(item) + '[' + str(count) + ']' for item, count in self.bag_value.items()) + ')'


We can take advantage of string formatting here to make the above code a little concise and easier to read:

def __str__(self):
    format_str = '{}[{}]'.format
    return 'Bag({})'.format(', '.join(format_str(k, v) for k, v in self.items()))


def __len__(self):
     bag_len = 0
     for value in self.bag_value:
         bag_len += self.bag_value[value]
     return bag_len


In our class we can obtain the length by simply summing the .values(), sum() returns 0 when the Bag is empty so it saves a lots of line:

def __len__(self):
    return sum(self.values())


We can simply drop __contains__ as Counter can already handle that part, and for unique() we can use len(self.keys()).

def __add__(self,right):
    mergedbag = Bag()
    mergedbag.bag_value = copy.copy(self.bag_value)
    for item in right.bag_value.keys():
        mergedbag.bag_value[item] += right.bag_value[item]
    return mergedbag


Here as we're only dealing with integer values copy.copy is not actually required, and instead of manual loop use the .update() method of dictionary:

def __add__(self, right):
    merged_bag = Bag()
    merged_bag.update(self)
    merged_bag.update(right)
    return merged_bag


In .remove() the only change is that I used string formatting with __repr__ version of item. __repr__ is more useful as with str() you won't be able to differentiate between 1 and '1'.

raise ValueError('{!r} not in bag'.format(item))


def __eq__(self, right):
    if type(right) is not Bag:
        raise TypeError('Cannot compare Bag with' + type_as_str(right) + '. Can only compare Bag with Bag')
    else:
        return (Counter(self.bag_value)==Counter(right))


The recommended way to check the type of an object is to use isinstance rather than calling type. And in else block I simply called the super class's __eq__ to handle the result instead of creating Counters:

def __eq__(self, right):
    if not isinstance(right, Bag):
        raise TypeError(('Cannot compare Bag with object of type {!r}. Can only '
                         'compare Bag with Bag').format(type(right).__name__))
    else:
        return super().__eq__(right)


def __iter__(self):
    return self.__generate__(self.bag_value)


Never define your own dunder method(__generate__), these methods are considered special by Python and if any future version of Python defines a method by the same name then your code can result in some unknown errors. In our __iter__ method we can simply yield from self.elements():

def __iter__(self):
    yield from self.elements()


Note that there should be only be single line of space between method of a class and your indentation is also messed up, always use 4 spaces no tabs(Change your IDE's setting to use 4 spaces for a tab). Read PEP-008 for more information on styling-guide.

Complete code:

```
from co

Code Snippets

def __init__(self, items = []):
    self.bag_value = defaultdict(int)
    for item in items:
        self.bag_value[item] += 1
class Bag(Counter):

     def __init__(self, _items=None, **kwargs):

          _items = [] if _items is None else _items
          super().__init__(_items, **kwargs)
Bag('aaabbccd')
Bag(a=3, b=5, c=7)
Bag('aabbcc', a=5, c=7, r=9)
Bag(a=5, c=7, r=9, items=9)
def __repr__(self):
    bag_list = []
    for item, count in self.bag_value.items():
         bag_list.extend(list(item*count))
    return 'Bag(' + str(bag_list) + ')'

Context

StackExchange Code Review Q#79055, answer score: 6

Revisions (0)

No revisions yet.