patternMinor
Fuzzy string matching algorithm with allowed events?
Viewed 0 times
allowedeventswithalgorithmstringmatchingfuzzy
Problem
I want to be able to locate a substring in a string allowing for a specified number of mismatches, insertions and deletions - and at the same time know how many mismatches, insertions and deletions were used for any match.
Using brute force backtrack I can find the matches, but I cannot guarantee that the match was produced using the fewest permutations possible.
Using dynamic programming I can find the matches and guarantee that the match was produced using the fewest permutations possible, but I cannot specify a number of allowed mismatches, insertion and deletions - only a total edit distance.
Using brute force backtrack I can find the matches, but I cannot guarantee that the match was produced using the fewest permutations possible.
Using dynamic programming I can find the matches and guarantee that the match was produced using the fewest permutations possible, but I cannot specify a number of allowed mismatches, insertion and deletions - only a total edit distance.
Solution
What you want is the so-called (optimal) local alignment of two strings. You can use the standard local alignment algorithm but allow insertions/deletions at the beginning and end (of the the shorter string) for free.
In the classic dynamic programming algorithm, this corresponds to initialising the first row (resp. column) with zero, removing the penalty for steps along the last row (resp. column) and finally looking for the minimum cost in the last row (resp. column) and backtrack from there. You can find alternatives in the Wikipedia article.
Restricting the number of deletions or insertions used can be implemented by adding counters to the cell values; if adding one would take one over the respective limit (stored globally), assign cost $\infty$ to the respective operation.
In the classic dynamic programming algorithm, this corresponds to initialising the first row (resp. column) with zero, removing the penalty for steps along the last row (resp. column) and finally looking for the minimum cost in the last row (resp. column) and backtrack from there. You can find alternatives in the Wikipedia article.
Restricting the number of deletions or insertions used can be implemented by adding counters to the cell values; if adding one would take one over the respective limit (stored globally), assign cost $\infty$ to the respective operation.
Context
StackExchange Computer Science Q#7040, answer score: 2
Revisions (0)
No revisions yet.