patternpythonMinor
Managing and searching objects using tags
Viewed 0 times
managingsearchingobjectstagsusingand
Problem
I wonder
-
Is it appropriate to hide imported classes (collections and UserDict in this case) from Python IDE (e.g. IPython)?
-
Is there a more efficient algorithm/implementation?
Please feel free to comment on how you would improve this class.
```
import collections as _collections
import UserDict as _UserDict
class _IdDict(_UserDict.IterableUserDict):
def __missing__(self,key):
raise KeyError("The item requested is not in the TagDict. "+\
"Perhaps more than one item were requested.")
class TagDict(object):
'''
TagDict is similar to a dictionary except
Keys are unique tags/attributes of all the items
Each key can be mapped onto multiple items that have
that key as a tag
TagDict[*tags] returns a list of items that share the same tags
TagDict["*"] returns all the items
'''
def __init__(self):
# Keys are tags. Values are sets of ids
self.data = _collections.defaultdict(set)
# Keys are ids of the objects. Values are (object,tags)
self._ids = _IdDict()
def add(self,item,tags):
''' Add an item with a list of tags
if tags is empty, the item will not be added to
the TagDict
Input:
item - an object
tags - a string or a list of strings
'''
if type(tags) is str: tags = [tags,]
tags = set(tags)
self._ids[id(item)] = (item,tags)
for tag in tags:
self.data[tag].add(id(item))
def __getitem__(self,tags):
''' Get the items that share the tags
Return:
A list of object
If the list contains only one object, return the object
Example:
TagDict["a","b"] returns all items that have both "a"
and "b" as tags
TagDict["*"] returns all the items in the TagDict
'''
if tags[0] == '*':
return [ value[0] for value in self._ids.values() ]
if type(tags)
-
Is it appropriate to hide imported classes (collections and UserDict in this case) from Python IDE (e.g. IPython)?
-
Is there a more efficient algorithm/implementation?
Please feel free to comment on how you would improve this class.
```
import collections as _collections
import UserDict as _UserDict
class _IdDict(_UserDict.IterableUserDict):
def __missing__(self,key):
raise KeyError("The item requested is not in the TagDict. "+\
"Perhaps more than one item were requested.")
class TagDict(object):
'''
TagDict is similar to a dictionary except
Keys are unique tags/attributes of all the items
Each key can be mapped onto multiple items that have
that key as a tag
TagDict[*tags] returns a list of items that share the same tags
TagDict["*"] returns all the items
'''
def __init__(self):
# Keys are tags. Values are sets of ids
self.data = _collections.defaultdict(set)
# Keys are ids of the objects. Values are (object,tags)
self._ids = _IdDict()
def add(self,item,tags):
''' Add an item with a list of tags
if tags is empty, the item will not be added to
the TagDict
Input:
item - an object
tags - a string or a list of strings
'''
if type(tags) is str: tags = [tags,]
tags = set(tags)
self._ids[id(item)] = (item,tags)
for tag in tags:
self.data[tag].add(id(item))
def __getitem__(self,tags):
''' Get the items that share the tags
Return:
A list of object
If the list contains only one object, return the object
Example:
TagDict["a","b"] returns all items that have both "a"
and "b" as tags
TagDict["*"] returns all the items in the TagDict
'''
if tags[0] == '*':
return [ value[0] for value in self._ids.values() ]
if type(tags)
Solution
Sean Perry has made several good points, which I'll not duplicate. Though, I think your imports with leading underscores are fine! A leading underscore in a global name suggests that the item is not part of the module's public interface.
First off, There is usually no need to use
The need for this class has been largely supplanted by the ability to subclass directly from dict (a feature that became available starting with Python version 2.2). Prior to the introduction of
So, your
Or, you could probably do without the special dict subclass entirely, and handle the exception raising in the
I see a few things that could be improved in your
You should probably add a check in
In
But you might need to think about whether that is what you actually want to happen. If a requested tag is not found in the
First off, There is usually no need to use
UserDict in new code. As the docs for that module say:The need for this class has been largely supplanted by the ability to subclass directly from dict (a feature that became available starting with Python version 2.2). Prior to the introduction of
dict, the UserDict class was used to create dictionary-like sub-classes that obtained new behaviors by overriding existing methods or adding new ones.So, your
_IdDict class should probably inherit from dict directly, rather than from UserDict unless you need to support Python versions older than 2.2! This will also improve your forward compatibility, as the UserDict module has been removed in Python 3.Or, you could probably do without the special dict subclass entirely, and handle the exception raising in the
TagDict yourself. Just catch whatever exception gets raised by a normal dictionary, and raise your own (in Python 3, you'd want to use raise Whatever() from None to suppress the previous exception context, but in Python 2 that's neither possible nor necessary).I see a few things that could be improved in your
TagDict class itself.You should probably add a check in
add to make sure the item being added isn't in the dictionary already. If it is and the tags it's being added under are not the same as the ones it was under previously, you may end up with inconsistent information in your data and _ids dicts.In
__getitem__ you have the expression self.data[tag] if tag in self.data else set() in your list comprehension. You can write this more concisely as self.data.get(tag, set()).But you might need to think about whether that is what you actually want to happen. If a requested tag is not found in the
data dictionary, the intersection of the sets is going to be empty. This means you'll end up returning an empty tuple. Perhaps you should raise an exception instead?Context
StackExchange Code Review Q#49555, answer score: 3
Revisions (0)
No revisions yet.