HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Find the shortest whole repetitive substring

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
theshortestwholesubstringrepetitivefind

Problem

I'm working on a problem to find wholly repeated shortest substring of a given string, and if no match, return length of the string.

My major idea is using a Trie tree to build substrings from length 1 to half length of the whole string, then traverse the Trie tree to find if there is a wholly repetitive match or not (since when I build Trie tree, I record the depth of leaf node and also how many times the leaf node has been reached).

I think my algorithm is still \$O(n^2)\$ and I'm looking for any code review comments for my current code and better ideas to improve time complexity.

Input and output example

catcatcat => 3
aaaaaa=>1
aaaaaba = > 7


My code

```
from __future__ import print_function
from collections import defaultdict
import sys
class TrieNode:
def __init__(self):
self.isEnd = False
self.children = defaultdict(TrieNode)
self.count = 0 # reached how many times
self.depth = 0 # reached by how long sub-string
def addNode(self, word):
if not word:
return
node = self
depth = 0
for ch in word:
node = node.children[ch]
node.isEnd = True
node.count += 1
node.depth = len(word)
# return minimal length of whole repetitive match
# in a recursive way
def traverseNode(self, totalLen):
if self.isEnd == True:
if (self.count * self.depth == totalLen):
return self.depth
else:
return totalLen
result = sys.maxint # set to a very big value
for node in self.children.values():
result = min(node.traverseNode(totalLen), result)

return result
if __name__ == "__main__":
results = []
word = 'catcatcatcat' # output 3
#word = 'aaaaaa' # output 1
#word = 'aaaaaab' # output 7
for step in range(1, len(word)//2 + 1):
# whole repetive string must be start from zero
i = 0
root = TrieNode()

Solution

I'm not from algorithmic background. Thanks for introducing the trie graph with this question. Below are my comments:

  • Enable PyLint and PEP8 checking in your IDE. This will inform you some basic things about naming conventions, indentation, and so on. For example, addNode() could be renamed as add_node(). Include a blank line between two methods. Include two blank lines above and below class definition.



  • Document your code formally using docstrings. In fact, it would be excellent to use docstrings to describe your algorithm. In its current state, users have to figure out how the code works. Most people will give up and instead come up with their own implementations. Even good open source projects can fail because of poor documentation. If you want others to use your code, write good documentation.



  • Constant sys.maxint is no longer available in Python3. Can change this to make it portable across Python versions. Read this: https://docs.python.org/3.1/whatsnew/3.0.html#integers



  • Your test of 'catcatcatcat' is hard-coded, with couple of other commented test strings. You could improve this by passing test strings from command line. Or you could hard-code a test vectors as a list. Or use doctest that will double up as your unit test.



  • For a string of length n, only sublengths that are factor of n and



  • Many Python programmers prefer for loops rather than while loops. Use of indices such as i can be removed by refactoring. Use enumerate() built-in function where possible if indices of a list are needed.



  • I found that traverseNode() is not required for this particular problem. What got me thinking was the argument totalLen. Your trie graph/tree must already have knowledge of the total length because you are tracking the count and depth. I then discovered that for this problem, the root must have exactly one direct child. This is enough. At least for this particular problem, is_end, count and depth are redundant.



Code is below:

"""
A module that implements trie search tree.
"""

from __future__ import print_function
from collections import defaultdict
import doctest

class TrieNode:
    """
    Node of a trie search tree.

    Contains as children other trie nodes.
    Each node contains a single character as key.
    Ref: https://en.wikipedia.org/wiki/Trie
    """
    def __init__(self):
        """
        Initialize with empty children.
        """
        self.is_end = False
        self.children = defaultdict(TrieNode)
        self.count = 0 # reached how many times
        self.depth = 0 # reached by how long sub-string

    def add_nodes(self, word):
        """
        Add child nodes recursively.

        First character of input word will be added as a direct child.
        The next character will be a child of the child, and so on.

        Args:
            word (str): Input word to process.
        """
        if not word:
            return
        node = self
        for ch in word:
            node = node.children[ch]
        node.is_end = True
        node.count += 1
        node.depth = len(word)

    @staticmethod
    def whole_shortest_string(word):
        """
        Return the length of the shortest string that spans the entire word.

        >>> TrieNode.whole_shortest_string('catcatcatcat')
        3
        >>> TrieNode.whole_shortest_string('aaaaaa')
        1
        >>> TrieNode.whole_shortest_string('aaaaaba')
        7
        """
        results = [len(word)]
        factors = (x for x in range(1, len(word)) if not len(word)%x)
        for sublen in factors:
            root = TrieNode()
            for subword in (word[x:x+sublen] for x in range(0, len(word), sublen)):
                root.add_nodes(subword)
            if len(root.children) == 1:
                results.append(sublen)
        return min(results)

if __name__ == "__main__":
    doctest.testmod()

Code Snippets

"""
A module that implements trie search tree.
"""

from __future__ import print_function
from collections import defaultdict
import doctest


class TrieNode:
    """
    Node of a trie search tree.

    Contains as children other trie nodes.
    Each node contains a single character as key.
    Ref: https://en.wikipedia.org/wiki/Trie
    """
    def __init__(self):
        """
        Initialize with empty children.
        """
        self.is_end = False
        self.children = defaultdict(TrieNode)
        self.count = 0 # reached how many times
        self.depth = 0 # reached by how long sub-string

    def add_nodes(self, word):
        """
        Add child nodes recursively.

        First character of input word will be added as a direct child.
        The next character will be a child of the child, and so on.

        Args:
            word (str): Input word to process.
        """
        if not word:
            return
        node = self
        for ch in word:
            node = node.children[ch]
        node.is_end = True
        node.count += 1
        node.depth = len(word)

    @staticmethod
    def whole_shortest_string(word):
        """
        Return the length of the shortest string that spans the entire word.

        >>> TrieNode.whole_shortest_string('catcatcatcat')
        3
        >>> TrieNode.whole_shortest_string('aaaaaa')
        1
        >>> TrieNode.whole_shortest_string('aaaaaba')
        7
        """
        results = [len(word)]
        factors = (x for x in range(1, len(word)) if not len(word)%x)
        for sublen in factors:
            root = TrieNode()
            for subword in (word[x:x+sublen] for x in range(0, len(word), sublen)):
                root.add_nodes(subword)
            if len(root.children) == 1:
                results.append(sublen)
        return min(results)


if __name__ == "__main__":
    doctest.testmod()

Context

StackExchange Code Review Q#143958, answer score: 6

Revisions (0)

No revisions yet.