HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Finding the largest repeating substring

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
thesubstringrepeatinglargestfinding

Problem

Here's my code that takes a large string, and searches for the longest reoccurring substring. It compares the first letter with every other until it finds a match, then saves it. Then it compares the first 2 letters with every other until it finds a match then saves it. Then 3, 4, etc... Then it comes back around and starts with the second letter and checks the first letter, then the first 2, then the first 3, etc.

I plan on using this on text files as large as a novel. The time complexity is horrendous. I think it's O(n# of characters in the text file). Is there any other way to approach this?

def largest_substring(string):

    length = 0
    x=0
    y=0

    for y in range(len(string)):       
        for x in range(len(string)):     
            substring = string[y:x]                   
            if len(list(re.finditer(re.escape(substring),string))) > 1  and len(substring) > length:
                match = substring
                length = len(substring)
    return match

Solution

I guess a prefix tree might help.

  • Build a parallel array of pointers (sorry for a C-speak) into an original text.



  • Sort it.



  • Scan it for a longest match in two consecutive entries.



Overall complexity is \$O(ns)\$

Where \$n\$ is the length of the text and \$s\$ is the length of the longest recurring substring.

Context

StackExchange Code Review Q#63329, answer score: 3

Revisions (0)

No revisions yet.