patternpythonMinor
Bag data structure (a set that can store multiple copies of items)
Viewed 0 times
cancopiesstorebagitemsthatstructuremultipledataset
Problem
I am trying to make my class a bit cleaner, especially some of its methods like
Bag class represents and defines methods, operators, and an iterator for the Bag class. Bags are similar to sets, and have similar operations but unlike sets they can store multiple copies of items. Store the information in bags as dictionaries whose elements are keys whose associated values are the number of times the key occurs in the bag.
```
from collections import defaultdict
from goody import type_as_str
import copy
from collections import Counter
class Bag:
def __init__(self, items = []):
self.bag_value = defaultdict(int)
for item in items:
self.bag_value[item] += 1
def __repr__(self):
bag_list = []
for item, count in self.bag_value.items():
bag_list.extend(list(item*count))
return 'Bag(' + str(bag_list) + ')'
def __str__(self):
return 'Bag(' + ','.join(str(item) + '[' + str(count) + ']' for item, count in self.bag_value.items()) + ')'
def __len__(self):
bag_len = 0
for value in self.bag_value:
bag_len += self.bag_value[value]
return bag_len
def unique(self):
return len(self.bag_value)
def __contains__(self, item):
return item in self.bag_value
def count(self, item):
if item in self.bag_value:
return self.bag_value[item]
else:
return 0
def add(self, new):
self.bag_value[new] += 1
def __add__(self,right):
mergedbag = Bag()
mergedbag.bag_value = copy.copy(self.bag_value)
for item in right.bag_value.keys():
mergedbag.bag_value[item] += right.bag_value[item]
return mergedbag
def remove(self, item):
if item in self.bag_value:
Count, __add__, remove and __iter__. I feel like I could use cleaner code on the above methods and get the same results.Bag class represents and defines methods, operators, and an iterator for the Bag class. Bags are similar to sets, and have similar operations but unlike sets they can store multiple copies of items. Store the information in bags as dictionaries whose elements are keys whose associated values are the number of times the key occurs in the bag.
```
from collections import defaultdict
from goody import type_as_str
import copy
from collections import Counter
class Bag:
def __init__(self, items = []):
self.bag_value = defaultdict(int)
for item in items:
self.bag_value[item] += 1
def __repr__(self):
bag_list = []
for item, count in self.bag_value.items():
bag_list.extend(list(item*count))
return 'Bag(' + str(bag_list) + ')'
def __str__(self):
return 'Bag(' + ','.join(str(item) + '[' + str(count) + ']' for item, count in self.bag_value.items()) + ')'
def __len__(self):
bag_len = 0
for value in self.bag_value:
bag_len += self.bag_value[value]
return bag_len
def unique(self):
return len(self.bag_value)
def __contains__(self, item):
return item in self.bag_value
def count(self, item):
if item in self.bag_value:
return self.bag_value[item]
else:
return 0
def add(self, new):
self.bag_value[new] += 1
def __add__(self,right):
mergedbag = Bag()
mergedbag.bag_value = copy.copy(self.bag_value)
for item in right.bag_value.keys():
mergedbag.bag_value[item] += right.bag_value[item]
return mergedbag
def remove(self, item):
if item in self.bag_value:
Solution
As you're trying to implement a
The
So, simply inherit from
Now, coming back to your code:
Never use a mutable object as default value of an argument as there value can persist between calls, the recommended way is to use
Why both
It depends on how you want to use it ultimately, but with this signature you can use it as:
I used
We can take advantage of string formatting here to make the above code a little concise and easier to read:
In our class we can obtain the length by simply summing the
We can simply drop
Here as we're only dealing with integer values
In
The recommended way to check the type of an object is to use
Never define your own dunder method(
Note that there should be only be single line of space between method of a class and your indentation is also messed up, always use 4 spaces no tabs(Change your IDE's setting to use 4 spaces for a tab). Read PEP-008 for more information on styling-guide.
Complete code:
```
from co
multiset (or bag) I would recommend you to use collections.Counter which already does the same thing:The
Counter class is similar to bags or multisets in other languages.So, simply inherit from
Counter class and add your additional methods in the class or override the already present methods from Counter class as per your requirements. Counter class also supports &, |, - operators to perform various multiset operations.Now, coming back to your code:
def __init__(self, items = []):
self.bag_value = defaultdict(int)
for item in items:
self.bag_value[item] += 1Never use a mutable object as default value of an argument as there value can persist between calls, the recommended way is to use
None as default value and then check it inside of the __init__ method. As we are now inheriting from Counter we can directly operate on self instead of using a defaultdict to keep the count.class Bag(Counter):
def __init__(self, _items=None, **kwargs):
_items = [] if _items is None else _items
super().__init__(_items, **kwargs)Why both
_items and kwargs?It depends on how you want to use it ultimately, but with this signature you can use it as:
Bag('aaabbccd')
Bag(a=3, b=5, c=7)
Bag('aabbcc', a=5, c=7, r=9)I used
_items to handle the case when user actually passed a key named items:Bag(a=5, c=7, r=9, items=9)def __repr__(self):
bag_list = []
for item, count in self.bag_value.items():
bag_list.extend(list(item*count))
return 'Bag(' + str(bag_list) + ')'Counter class already has a method to handle this called elements() which returns an itertools.chain object which is an iterator, we can call list on it and we are done:def __repr__(self):
return 'Bag({})'.format(list(self.elements()))def __str__(self):
return 'Bag(' + ','.join(str(item) + '[' + str(count) + ']' for item, count in self.bag_value.items()) + ')'We can take advantage of string formatting here to make the above code a little concise and easier to read:
def __str__(self):
format_str = '{}[{}]'.format
return 'Bag({})'.format(', '.join(format_str(k, v) for k, v in self.items()))def __len__(self):
bag_len = 0
for value in self.bag_value:
bag_len += self.bag_value[value]
return bag_lenIn our class we can obtain the length by simply summing the
.values(), sum() returns 0 when the Bag is empty so it saves a lots of line:def __len__(self):
return sum(self.values())We can simply drop
__contains__ as Counter can already handle that part, and for unique() we can use len(self.keys()).def __add__(self,right):
mergedbag = Bag()
mergedbag.bag_value = copy.copy(self.bag_value)
for item in right.bag_value.keys():
mergedbag.bag_value[item] += right.bag_value[item]
return mergedbagHere as we're only dealing with integer values
copy.copy is not actually required, and instead of manual loop use the .update() method of dictionary:def __add__(self, right):
merged_bag = Bag()
merged_bag.update(self)
merged_bag.update(right)
return merged_bagIn
.remove() the only change is that I used string formatting with __repr__ version of item. __repr__ is more useful as with str() you won't be able to differentiate between 1 and '1'.raise ValueError('{!r} not in bag'.format(item))def __eq__(self, right):
if type(right) is not Bag:
raise TypeError('Cannot compare Bag with' + type_as_str(right) + '. Can only compare Bag with Bag')
else:
return (Counter(self.bag_value)==Counter(right))The recommended way to check the type of an object is to use
isinstance rather than calling type. And in else block I simply called the super class's __eq__ to handle the result instead of creating Counters:def __eq__(self, right):
if not isinstance(right, Bag):
raise TypeError(('Cannot compare Bag with object of type {!r}. Can only '
'compare Bag with Bag').format(type(right).__name__))
else:
return super().__eq__(right)def __iter__(self):
return self.__generate__(self.bag_value)Never define your own dunder method(
__generate__), these methods are considered special by Python and if any future version of Python defines a method by the same name then your code can result in some unknown errors. In our __iter__ method we can simply yield from self.elements():def __iter__(self):
yield from self.elements()Note that there should be only be single line of space between method of a class and your indentation is also messed up, always use 4 spaces no tabs(Change your IDE's setting to use 4 spaces for a tab). Read PEP-008 for more information on styling-guide.
Complete code:
```
from co
Code Snippets
def __init__(self, items = []):
self.bag_value = defaultdict(int)
for item in items:
self.bag_value[item] += 1class Bag(Counter):
def __init__(self, _items=None, **kwargs):
_items = [] if _items is None else _items
super().__init__(_items, **kwargs)Bag('aaabbccd')
Bag(a=3, b=5, c=7)
Bag('aabbcc', a=5, c=7, r=9)Bag(a=5, c=7, r=9, items=9)def __repr__(self):
bag_list = []
for item, count in self.bag_value.items():
bag_list.extend(list(item*count))
return 'Bag(' + str(bag_list) + ')'Context
StackExchange Code Review Q#79055, answer score: 6
Revisions (0)
No revisions yet.