patternMinor
I'm looking for an algorithm to find unknown patterns in a string
Viewed 0 times
unknownpatternsalgorithmlookingforfindstring
Problem
I am trying to detect patterns in huge code bases.
I managed to filter the entire codebase into a tagged string,
as in:
ABACBABAABBCBABA
The result should be:
ABA *3
CBA *2
I'm trying to build / use an algorithm which will find ANY unknown repeating pattern inside the string. The length of the pattern, it's composition, and the number of repeats is unknow.
To be a pattern it must occur atleast twice. And have atleast 2 items.
Once I detect the patterns I can represent them back in their original context
I have tried iterating over each tag.
For each tag find the following tag in the string.
continue until adding a tag matches only one repeat - hence no more pattern.
I get lost in implemetation (in JS or Python) and I'm hoping there is a better way.
Thanks.
I managed to filter the entire codebase into a tagged string,
as in:
ABACBABAABBCBABA
The result should be:
ABA *3
CBA *2
I'm trying to build / use an algorithm which will find ANY unknown repeating pattern inside the string. The length of the pattern, it's composition, and the number of repeats is unknow.
To be a pattern it must occur atleast twice. And have atleast 2 items.
Once I detect the patterns I can represent them back in their original context
I have tried iterating over each tag.
For each tag find the following tag in the string.
continue until adding a tag matches only one repeat - hence no more pattern.
I get lost in implemetation (in JS or Python) and I'm hoping there is a better way.
Thanks.
Solution
Construct the suffix tree of your string, which takes time linear in the length of the string (assuming a finite alphabet). Every inner node represents a repeat, their respective descendant leaves encode the positions.
Context
StackExchange Computer Science Q#79182, answer score: 5
Revisions (0)
No revisions yet.