HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Prefix search by trie tree

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
prefixtrietreesearch

Problem

I'm working on the prefix search problem:


Given a set of words, for example words = ['a', 'apple', 'angle', 'angel', 'bat', 'bats'], for any given prefix, find all matched words. For example, if input is ang, return 'angle', 'angel', if no match, return an empty list [].

Any advice on performance improvement in terms of algorithm time complexity (not sure if Trie Tree is the best solution), code bugs or general advice on code style is highly appreciated.

from collections import defaultdict
class TrieNode:
    def __init__(self):
        self.children = defaultdict(TrieNode)
        self.isEnd = False
    def insert(self, word):
        node = self
        for w in word:
            node = node.children[w]
        node.isEnd = True
    def search(self, word):
        node = self
        for w in word:
            if w in node.children:
                node = node.children[w]
            else:
                return []
        # prefix match
        # traverse currnt node to all leaf nodes
        result = []
        self.traverse(node, list(word), result)
        return [''.join(r) for r in result]

    def traverse(self, root, prefix, result):
        if root.isEnd:
            result.append(prefix[:])
        for c,n in root.children.items():
            prefix.append(c)
            self.traverse(n, prefix, result)
            prefix.pop(-1)
if __name__ == "__main__":
    words = ['a', 'apple', 'angle', 'angel', 'bat', 'bats']
    root = TrieNode()
    for w in words:
        root.insert(w)
    print root.search('a') # 'a', 'apple', 'angle', 'angel'
    print root.search('ang') # 'angle', 'angel'
    print root.search('angl') # 'angle'
    print root.search('z') # []

Solution

The Trie definitely fits the problem greatly. Here are some other points, adding to the @coderodde's awesome answer:

-
I don't particularly like the way you traverse the trie to get all paths to the leaf nodes. I would make traverse() method a generator:

def traverse(self, root, prefix):
    if root.is_leaf:
        yield prefix

    for c, n in root.children.items():
        prefix.append(c)
        yield from self.traverse(n, prefix)  # Python 3.3+
        prefix.pop()


Then, you can improve your search() method by returning from traverse():

return [''.join(r) for r in self.traverse(node, list(word))]


-
you can define __slots__ to improve on memory usage and performance:

class TrieNode:
    __slots__ = ['children', 'is_leaf']


-
note the "currnt" typo

  • put 2 newlines after the import statements (PEP8 reference)

Code Snippets

def traverse(self, root, prefix):
    if root.is_leaf:
        yield prefix

    for c, n in root.children.items():
        prefix.append(c)
        yield from self.traverse(n, prefix)  # Python 3.3+
        prefix.pop()
return [''.join(r) for r in self.traverse(node, list(word))]
class TrieNode:
    __slots__ = ['children', 'is_leaf']

Context

StackExchange Code Review Q#155303, answer score: 3

Revisions (0)

No revisions yet.