patternpythonMinor
Readability and Performance of my Trie implementation
Viewed 0 times
readabilityperformancetrieandimplementation
Problem
As an interesting exercise in Python 3.3, I implemented a Trie (also known as a Prefix Tree).
Example usages
Create from list of key-value-tuples
Output
Find out lengths of words in a string
Output
Implementation
For simplicity, each entry is represented by a
I made it iterable so I don't have to constantly write things like
Here's the
```
class Trie:
@staticmethod
def from_mapping(mapping):
"""
Constructs a Trie from an iterable of two-item tuples.
:param mapping: An iterable of key value pairs (each key must itself
Example usages
Create from list of key-value-tuples
mapping = (('she', 1), ('sells', 5), ('sea', 10), ('shells', 19), ('today', 5))
trie = Trie.from_mapping(mapping)
some_prefix = 'sh'
some_key = 'today'
print('Keys:', *trie)
print('Items: ', *trie.items())
print('Keys prefixed "{}":'.format(some_prefix), *trie.keys(prefix=some_prefix))
trie[some_key] = -55
print('Value of "{}":'.format(some_key), trie[some_key])
print(some_key, 'is in' if some_key in trie else 'is not in', 'the Trie')Output
Keys: today sells sea she shells
Items: ('today', 5) ('sells', 5) ('sea', 10) ('she', 1) ('shells', 19)
Keys prefixed "sh": she shells
Value of "today": -55
today is in the TrieFind out lengths of words in a string
words = 'She sells sea shells today she sells no shells she sells'.split()
trie = Trie.from_keys(words, value_selector=len)
print(*trie.items())Output
('sells', 5) ('sea', 3) ('she', 3) ('shells', 6) ('today', 5) ('no', 2) ('She', 3)Implementation
For simplicity, each entry is represented by a
dict mapping characters to child entries – if this is problematic, feel free to remark on that.class _Entry:
def __init__(self, value=None):
self.value = value
self.children = {}
def __iter__(self):
return iter(self.children.items())I made it iterable so I don't have to constantly write things like
entry.children.items().Here's the
Trie itself – please feel free to point out any or all issues you can spot. I started taking a look at Python two weeks ago, so I'd also appreciate your comments on how Pythonic (or not) this is, as well as any potential issues of performance.```
class Trie:
@staticmethod
def from_mapping(mapping):
"""
Constructs a Trie from an iterable of two-item tuples.
:param mapping: An iterable of key value pairs (each key must itself
Solution
- Bug
-
Sometimes the
_length attribute is updated wrongly. First, when deleting an item, the length is decremented regardless of whether the item was actually present in the trie:>>> trie = Trie()
>>> del trie['key']
>>> len(trie)
Traceback (most recent call last):
File "", line 1, in
ValueError: __len__() should return >= 0Second, when updating an item, the length is incremented regardless of whether the item was actually new:
>>> trie = Trie()
>>> trie['key'] = 1
>>> trie['key'] = 2
>>> print(*trie)
key
>>> len(trie)
2- Other comments
-
You have docstrings for all your methods: awesome! You might consider adding a docstring for the
Trie class (look at the output of help(Trie) and consider what else is needed). This is also a good class to consider adding doctests.-
Since the class
_Entry is only used in Trie, you could make it an attribute of Trie.-
In the
_Entry class, you use None as a placeholder to mean that there is no value associated with this entry. This prevents anyone from storing None as a value in a Trie. Instead, you could leave the value attribute unset to indicate no value (this can be detected with the hasattr function), and write del entry.value to delete it.-
You could have
_Entry inherit from dict and then you wouldn't need to have an attribute children.-
By using
@staticmethod you prevent anyone from subclassing Trie, because when someone calls SubTrie.from_mapping, the object they get back will be a Trie rather than a SubTrie. You should use @classmethod instead:@classmethod
def from_mapping(class_, mapping):
trie = class_()
# ...-
It's not clear why you have the lines:
if not key:
continuein
from_mapping and from_keys. Why can't I add the empty string to a Trie? (If you have a good reason, it would be worth mentioning it in the documentation, and also worth raising an error here, instead of silently discarding these keys.)-
The
__getitem__ method returns None to indicate failure. In Python it's normal to raise an exception to indicate failure rather than returning an exceptional type. In this case it would be appropriate to raise KeyError (just as the built-in dict.__getitem__ does).-
You might consider whether
Trie should inherit from MutableMapping in the collections.abc module: this would give you values, get, __eq__, __ne__, pop, popitem, clear, update and setdefault methods for free.-
Since your
Trie class presents a mapping-like interface, Python programmers will expect its update method to present a similar interface to the update method on ordinary dicationary objects. (This is easy if you inherit from MutableMapping since the update method is defined in that abstract base class.)(Your original
update would then need a new name.)-
Similarly, since your
Trie class presents a mapping-like interface, Python programmers will expect its constructor to present a similar interface to the dict constructor. (Again, this is easy if you inherit from MutableMapping since you can just pass the positional and keyword arguments to the update method.)-
It seems perverse that the
value_selector argument to the update method is different from the value_selector argument to the from_keys method. (The former takes the old value as its argument but the latter takes the key as the argument.) This can only be confusing to users.Also, if your
update method is called, but there's no value in the trie for key, the value_selector function gets called anyway (with the argument None). You might consider whether this method should take a third argument, the value to use in the case where there is no value in the trie. (Or maybe you should raise KeyError in this case.)I have a feeling that your
from_keys and update methods are not very Pythonic. Instead of from_keys, callers would be happy to write:Trie((key, f(key)) for key in keys)and instead of
trie.update(key, lambda old_value: old_value + 1) caller would be happy to writetrie[key] += 1My guess is that your goal in writing the
update method is to avoid the double lookup in the above. So it's a reasonable thing to provide, it just needs some thought about the name and interface.-
You could simplify some of your logic if your
_find method was able to create new _Entry objects. See below for how.-
If your function body consists only of
yield from , you might as well write return .- Revised code
This is how I would write the class, addressing all the points above.
```
from collections.abc import MutableMapping
class Trie(MutableMapping):
"""
A prefix tree.
>>> mapping = dict(she=1, sells=5, sea=10, shells=19, today=5)
>>> trie = Trie(mapping)
>>> some_prefix = 'sh'
>>> some_key = 'today'
>>> print(*sorted(trie))
sea sells she shells today
>>> print(
Code Snippets
>>> trie = Trie()
>>> del trie['key']
>>> len(trie)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: __len__() should return >= 0>>> trie = Trie()
>>> trie['key'] = 1
>>> trie['key'] = 2
>>> print(*trie)
key
>>> len(trie)
2@classmethod
def from_mapping(class_, mapping):
trie = class_()
# ...if not key:
continueTrie((key, f(key)) for key in keys)Context
StackExchange Code Review Q#24602, answer score: 8
Revisions (0)
No revisions yet.