patternMinor
Does KMP Algorithm handle finding multiple matches in overlapping cases?
Viewed 0 times
matcheshandlealgorithmoverlappingkmpdoesfindingmultiplecases
Problem
I am wondering if KMP algorithm can find multiple matches in a given string when there is an overlap.
Here is an example.
Word to Search from (S):
Search word (W):
If you look at "W", it matches "S" from index 0 as well as from index 3
When I looked at the pseudo code on Wikipedia and implemented it, KMP algorithm seems to return only after the first match (
Am I implementing it wrong or does KMP doesn't handle overlapping cases?
Here is an example.
Word to Search from (S):
abcabcabc Search word (W):
abcabcIf you look at "W", it matches "S" from index 0 as well as from index 3
- (abcabc)abc
- abc(abcabc)
When I looked at the pseudo code on Wikipedia and implemented it, KMP algorithm seems to return only after the first match (
(abcabc)abc).Am I implementing it wrong or does KMP doesn't handle overlapping cases?
Solution
This most likely is dependent on the implementation. The failure function is used when the matching of "next" character fails, thus falling back to a prefix of the pattern which is the longest proper suffix up to that point.
Now when we find a match, we are at a accepting state. This accepting state has no need to look at the "next" character because it accepts no more characters thus it should automatically fail at the next character. This is easier to see in a String Matching DFA. Note that this comparison is valid because KMP is based on a string matching DFA.
We can create a DFA for "ABCABC" (dotted lines are failure functions):
The accepting state automatically fails on whatever input is next, but should pick back up at state 3. So if you implement the failures correctly, it should continue to find the overlapping occurrences.
If you look at the worked examples of the failure function you can see the last state (accepting state) has the length of the longest suffix which is also a proper prefix. This is what you should use to "auto-fail" back to after finding a match. I believe this is also captured in the pseudocode for KMP, where it sets
The
Now when we find a match, we are at a accepting state. This accepting state has no need to look at the "next" character because it accepts no more characters thus it should automatically fail at the next character. This is easier to see in a String Matching DFA. Note that this comparison is valid because KMP is based on a string matching DFA.
We can create a DFA for "ABCABC" (dotted lines are failure functions):
The accepting state automatically fails on whatever input is next, but should pick back up at state 3. So if you implement the failures correctly, it should continue to find the overlapping occurrences.
If you look at the worked examples of the failure function you can see the last state (accepting state) has the length of the longest suffix which is also a proper prefix. This is what you should use to "auto-fail" back to after finding a match. I believe this is also captured in the pseudocode for KMP, where it sets
m and i:...
if i = length(W) then
return m (or store/use occurrence)
let m = m + i - T[i], i = T[i] (preparing for search next occurrence)
...The
return here should be removed if you want to find more than one match.Code Snippets
...
if i = length(W) then
return m (or store/use occurrence)
let m = m + i - T[i], i = T[i] (preparing for search next occurrence)
...Context
StackExchange Computer Science Q#76339, answer score: 6
Revisions (0)
No revisions yet.