patternpythonMinor
Find the shortest whole repetitive substring
Viewed 0 times
theshortestwholesubstringrepetitivefind
Problem
I'm working on a problem to find wholly repeated shortest substring of a given string, and if no match, return length of the string.
My major idea is using a
I think my algorithm is still \$O(n^2)\$ and I'm looking for any code review comments for my current code and better ideas to improve time complexity.
Input and output example
My code
```
from __future__ import print_function
from collections import defaultdict
import sys
class TrieNode:
def __init__(self):
self.isEnd = False
self.children = defaultdict(TrieNode)
self.count = 0 # reached how many times
self.depth = 0 # reached by how long sub-string
def addNode(self, word):
if not word:
return
node = self
depth = 0
for ch in word:
node = node.children[ch]
node.isEnd = True
node.count += 1
node.depth = len(word)
# return minimal length of whole repetitive match
# in a recursive way
def traverseNode(self, totalLen):
if self.isEnd == True:
if (self.count * self.depth == totalLen):
return self.depth
else:
return totalLen
result = sys.maxint # set to a very big value
for node in self.children.values():
result = min(node.traverseNode(totalLen), result)
return result
if __name__ == "__main__":
results = []
word = 'catcatcatcat' # output 3
#word = 'aaaaaa' # output 1
#word = 'aaaaaab' # output 7
for step in range(1, len(word)//2 + 1):
# whole repetive string must be start from zero
i = 0
root = TrieNode()
My major idea is using a
Trie tree to build substrings from length 1 to half length of the whole string, then traverse the Trie tree to find if there is a wholly repetitive match or not (since when I build Trie tree, I record the depth of leaf node and also how many times the leaf node has been reached).I think my algorithm is still \$O(n^2)\$ and I'm looking for any code review comments for my current code and better ideas to improve time complexity.
Input and output example
catcatcat => 3
aaaaaa=>1
aaaaaba = > 7My code
```
from __future__ import print_function
from collections import defaultdict
import sys
class TrieNode:
def __init__(self):
self.isEnd = False
self.children = defaultdict(TrieNode)
self.count = 0 # reached how many times
self.depth = 0 # reached by how long sub-string
def addNode(self, word):
if not word:
return
node = self
depth = 0
for ch in word:
node = node.children[ch]
node.isEnd = True
node.count += 1
node.depth = len(word)
# return minimal length of whole repetitive match
# in a recursive way
def traverseNode(self, totalLen):
if self.isEnd == True:
if (self.count * self.depth == totalLen):
return self.depth
else:
return totalLen
result = sys.maxint # set to a very big value
for node in self.children.values():
result = min(node.traverseNode(totalLen), result)
return result
if __name__ == "__main__":
results = []
word = 'catcatcatcat' # output 3
#word = 'aaaaaa' # output 1
#word = 'aaaaaab' # output 7
for step in range(1, len(word)//2 + 1):
# whole repetive string must be start from zero
i = 0
root = TrieNode()
Solution
I'm not from algorithmic background. Thanks for introducing the trie graph with this question. Below are my comments:
Code is below:
- Enable PyLint and PEP8 checking in your IDE. This will inform you some basic things about naming conventions, indentation, and so on. For example,
addNode()could be renamed asadd_node(). Include a blank line between two methods. Include two blank lines above and below class definition.
- Document your code formally using docstrings. In fact, it would be excellent to use docstrings to describe your algorithm. In its current state, users have to figure out how the code works. Most people will give up and instead come up with their own implementations. Even good open source projects can fail because of poor documentation. If you want others to use your code, write good documentation.
- Constant
sys.maxintis no longer available in Python3. Can change this to make it portable across Python versions. Read this: https://docs.python.org/3.1/whatsnew/3.0.html#integers
- Your test of 'catcatcatcat' is hard-coded, with couple of other commented test strings. You could improve this by passing test strings from command line. Or you could hard-code a test vectors as a list. Or use
doctestthat will double up as your unit test.
- For a string of length n, only sublengths that are factor of n and
- Many Python programmers prefer
forloops rather thanwhileloops. Use of indices such asican be removed by refactoring. Useenumerate()built-in function where possible if indices of a list are needed.
- I found that
traverseNode()is not required for this particular problem. What got me thinking was the argumenttotalLen. Your trie graph/tree must already have knowledge of the total length because you are tracking thecountanddepth. I then discovered that for this problem, the root must have exactly one direct child. This is enough. At least for this particular problem,is_end,countanddepthare redundant.
Code is below:
"""
A module that implements trie search tree.
"""
from __future__ import print_function
from collections import defaultdict
import doctest
class TrieNode:
"""
Node of a trie search tree.
Contains as children other trie nodes.
Each node contains a single character as key.
Ref: https://en.wikipedia.org/wiki/Trie
"""
def __init__(self):
"""
Initialize with empty children.
"""
self.is_end = False
self.children = defaultdict(TrieNode)
self.count = 0 # reached how many times
self.depth = 0 # reached by how long sub-string
def add_nodes(self, word):
"""
Add child nodes recursively.
First character of input word will be added as a direct child.
The next character will be a child of the child, and so on.
Args:
word (str): Input word to process.
"""
if not word:
return
node = self
for ch in word:
node = node.children[ch]
node.is_end = True
node.count += 1
node.depth = len(word)
@staticmethod
def whole_shortest_string(word):
"""
Return the length of the shortest string that spans the entire word.
>>> TrieNode.whole_shortest_string('catcatcatcat')
3
>>> TrieNode.whole_shortest_string('aaaaaa')
1
>>> TrieNode.whole_shortest_string('aaaaaba')
7
"""
results = [len(word)]
factors = (x for x in range(1, len(word)) if not len(word)%x)
for sublen in factors:
root = TrieNode()
for subword in (word[x:x+sublen] for x in range(0, len(word), sublen)):
root.add_nodes(subword)
if len(root.children) == 1:
results.append(sublen)
return min(results)
if __name__ == "__main__":
doctest.testmod()Code Snippets
"""
A module that implements trie search tree.
"""
from __future__ import print_function
from collections import defaultdict
import doctest
class TrieNode:
"""
Node of a trie search tree.
Contains as children other trie nodes.
Each node contains a single character as key.
Ref: https://en.wikipedia.org/wiki/Trie
"""
def __init__(self):
"""
Initialize with empty children.
"""
self.is_end = False
self.children = defaultdict(TrieNode)
self.count = 0 # reached how many times
self.depth = 0 # reached by how long sub-string
def add_nodes(self, word):
"""
Add child nodes recursively.
First character of input word will be added as a direct child.
The next character will be a child of the child, and so on.
Args:
word (str): Input word to process.
"""
if not word:
return
node = self
for ch in word:
node = node.children[ch]
node.is_end = True
node.count += 1
node.depth = len(word)
@staticmethod
def whole_shortest_string(word):
"""
Return the length of the shortest string that spans the entire word.
>>> TrieNode.whole_shortest_string('catcatcatcat')
3
>>> TrieNode.whole_shortest_string('aaaaaa')
1
>>> TrieNode.whole_shortest_string('aaaaaba')
7
"""
results = [len(word)]
factors = (x for x in range(1, len(word)) if not len(word)%x)
for sublen in factors:
root = TrieNode()
for subword in (word[x:x+sublen] for x in range(0, len(word), sublen)):
root.add_nodes(subword)
if len(root.children) == 1:
results.append(sublen)
return min(results)
if __name__ == "__main__":
doctest.testmod()Context
StackExchange Code Review Q#143958, answer score: 6
Revisions (0)
No revisions yet.