patternpythonMinor
Finding the largest repeating substring
Viewed 0 times
thesubstringrepeatinglargestfinding
Problem
Here's my code that takes a large string, and searches for the longest reoccurring substring. It compares the first letter with every other until it finds a match, then saves it. Then it compares the first 2 letters with every other until it finds a match then saves it. Then 3, 4, etc... Then it comes back around and starts with the second letter and checks the first letter, then the first 2, then the first 3, etc.
I plan on using this on text files as large as a novel. The time complexity is horrendous. I think it's O(n# of characters in the text file). Is there any other way to approach this?
I plan on using this on text files as large as a novel. The time complexity is horrendous. I think it's O(n# of characters in the text file). Is there any other way to approach this?
def largest_substring(string):
length = 0
x=0
y=0
for y in range(len(string)):
for x in range(len(string)):
substring = string[y:x]
if len(list(re.finditer(re.escape(substring),string))) > 1 and len(substring) > length:
match = substring
length = len(substring)
return matchSolution
I guess a prefix tree might help.
Overall complexity is \$O(ns)\$
Where \$n\$ is the length of the text and \$s\$ is the length of the longest recurring substring.
- Build a parallel array of pointers (sorry for a C-speak) into an original text.
- Sort it.
- Scan it for a longest match in two consecutive entries.
Overall complexity is \$O(ns)\$
Where \$n\$ is the length of the text and \$s\$ is the length of the longest recurring substring.
Context
StackExchange Code Review Q#63329, answer score: 3
Revisions (0)
No revisions yet.