HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Readability and Performance of my Trie implementation

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
readabilityperformancetrieandimplementation

Problem

As an interesting exercise in Python 3.3, I implemented a Trie (also known as a Prefix Tree).
Example usages
Create from list of key-value-tuples

mapping = (('she', 1), ('sells', 5), ('sea', 10), ('shells', 19), ('today', 5))
trie = Trie.from_mapping(mapping)
some_prefix = 'sh'
some_key = 'today'

print('Keys:', *trie)
print('Items: ', *trie.items())
print('Keys prefixed "{}":'.format(some_prefix), *trie.keys(prefix=some_prefix))
trie[some_key] = -55
print('Value of "{}":'.format(some_key), trie[some_key])
print(some_key, 'is in' if some_key in trie else 'is not in', 'the Trie')


Output

Keys: today sells sea she shells
Items:  ('today', 5) ('sells', 5) ('sea', 10) ('she', 1) ('shells', 19)
Keys prefixed "sh": she shells
Value of "today": -55
today is in the Trie


Find out lengths of words in a string

words = 'She sells sea shells today she sells no shells she sells'.split()
trie = Trie.from_keys(words, value_selector=len)
print(*trie.items())


Output

('sells', 5) ('sea', 3) ('she', 3) ('shells', 6) ('today', 5) ('no', 2) ('She', 3)


Implementation

For simplicity, each entry is represented by a dict mapping characters to child entries – if this is problematic, feel free to remark on that.

class _Entry:
    def __init__(self, value=None):
        self.value = value
        self.children = {}

    def __iter__(self):
        return iter(self.children.items())


I made it iterable so I don't have to constantly write things like entry.children.items().

Here's the Trie itself – please feel free to point out any or all issues you can spot. I started taking a look at Python two weeks ago, so I'd also appreciate your comments on how Pythonic (or not) this is, as well as any potential issues of performance.

```
class Trie:
@staticmethod
def from_mapping(mapping):
"""
Constructs a Trie from an iterable of two-item tuples.
:param mapping: An iterable of key value pairs (each key must itself

Solution


  1. Bug



-
Sometimes the _length attribute is updated wrongly. First, when deleting an item, the length is decremented regardless of whether the item was actually present in the trie:

>>> trie = Trie()
>>> del trie['key']
>>> len(trie)
Traceback (most recent call last):
  File "", line 1, in 
ValueError: __len__() should return >= 0


Second, when updating an item, the length is incremented regardless of whether the item was actually new:

>>> trie = Trie()
>>> trie['key'] = 1
>>> trie['key'] = 2
>>> print(*trie)
key
>>> len(trie)
2


  1. Other comments



-
You have docstrings for all your methods: awesome! You might consider adding a docstring for the Trie class (look at the output of help(Trie) and consider what else is needed). This is also a good class to consider adding doctests.

-
Since the class _Entry is only used in Trie, you could make it an attribute of Trie.

-
In the _Entry class, you use None as a placeholder to mean that there is no value associated with this entry. This prevents anyone from storing None as a value in a Trie. Instead, you could leave the value attribute unset to indicate no value (this can be detected with the hasattr function), and write del entry.value to delete it.

-
You could have _Entry inherit from dict and then you wouldn't need to have an attribute children.

-
By using @staticmethod you prevent anyone from subclassing Trie, because when someone calls SubTrie.from_mapping, the object they get back will be a Trie rather than a SubTrie. You should use @classmethod instead:

@classmethod
def from_mapping(class_, mapping):
    trie = class_()
    # ...


-
It's not clear why you have the lines:

if not key:
    continue


in from_mapping and from_keys. Why can't I add the empty string to a Trie? (If you have a good reason, it would be worth mentioning it in the documentation, and also worth raising an error here, instead of silently discarding these keys.)

-
The __getitem__ method returns None to indicate failure. In Python it's normal to raise an exception to indicate failure rather than returning an exceptional type. In this case it would be appropriate to raise KeyError (just as the built-in dict.__getitem__ does).

-
You might consider whether Trie should inherit from MutableMapping in the collections.abc module: this would give you values, get, __eq__, __ne__, pop, popitem, clear, update and setdefault methods for free.

-
Since your Trie class presents a mapping-like interface, Python programmers will expect its update method to present a similar interface to the update method on ordinary dicationary objects. (This is easy if you inherit from MutableMapping since the update method is defined in that abstract base class.)

(Your original update would then need a new name.)

-
Similarly, since your Trie class presents a mapping-like interface, Python programmers will expect its constructor to present a similar interface to the dict constructor. (Again, this is easy if you inherit from MutableMapping since you can just pass the positional and keyword arguments to the update method.)

-
It seems perverse that the value_selector argument to the update method is different from the value_selector argument to the from_keys method. (The former takes the old value as its argument but the latter takes the key as the argument.) This can only be confusing to users.

Also, if your update method is called, but there's no value in the trie for key, the value_selector function gets called anyway (with the argument None). You might consider whether this method should take a third argument, the value to use in the case where there is no value in the trie. (Or maybe you should raise KeyError in this case.)

I have a feeling that your from_keys and update methods are not very Pythonic. Instead of from_keys, callers would be happy to write:

Trie((key, f(key)) for key in keys)


and instead of trie.update(key, lambda old_value: old_value + 1) caller would be happy to write

trie[key] += 1


My guess is that your goal in writing the update method is to avoid the double lookup in the above. So it's a reasonable thing to provide, it just needs some thought about the name and interface.

-
You could simplify some of your logic if your _find method was able to create new _Entry objects. See below for how.

-
If your function body consists only of yield from , you might as well write return .

  1. Revised code



This is how I would write the class, addressing all the points above.

```
from collections.abc import MutableMapping

class Trie(MutableMapping):
"""
A prefix tree.

>>> mapping = dict(she=1, sells=5, sea=10, shells=19, today=5)
>>> trie = Trie(mapping)
>>> some_prefix = 'sh'
>>> some_key = 'today'
>>> print(*sorted(trie))
sea sells she shells today
>>> print(

Code Snippets

>>> trie = Trie()
>>> del trie['key']
>>> len(trie)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: __len__() should return >= 0
>>> trie = Trie()
>>> trie['key'] = 1
>>> trie['key'] = 2
>>> print(*trie)
key
>>> len(trie)
2
@classmethod
def from_mapping(class_, mapping):
    trie = class_()
    # ...
if not key:
    continue
Trie((key, f(key)) for key in keys)

Context

StackExchange Code Review Q#24602, answer score: 8

Revisions (0)

No revisions yet.