HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

I'm looking for an algorithm to find unknown patterns in a string

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
unknownpatternsalgorithmlookingforfindstring

Problem

I am trying to detect patterns in huge code bases.
I managed to filter the entire codebase into a tagged string,
as in:
ABACBABAABBCBABA

The result should be:
ABA *3
CBA *2

I'm trying to build / use an algorithm which will find ANY unknown repeating pattern inside the string. The length of the pattern, it's composition, and the number of repeats is unknow.

To be a pattern it must occur atleast twice. And have atleast 2 items.
Once I detect the patterns I can represent them back in their original context

I have tried iterating over each tag.
For each tag find the following tag in the string.
continue until adding a tag matches only one repeat - hence no more pattern.

I get lost in implemetation (in JS or Python) and I'm hoping there is a better way.
Thanks.

Solution

Construct the suffix tree of your string, which takes time linear in the length of the string (assuming a finite alphabet). Every inner node represents a repeat, their respective descendant leaves encode the positions.

Context

StackExchange Computer Science Q#79182, answer score: 5

Revisions (0)

No revisions yet.