patternjavaMinor
Comparing two approximate string matching algorithms in Java
Viewed 0 times
approximatecomparingjavaalgorithmstwostringmatching
Problem
Problem description
Given a text \$T[0..n)\$, a pattern \$P[0..m)\$, and a nonnegative
integer \$k\$, report all positions \$j \in [0..n)\$ such that \$ed(P,
T(j - l..j]) \leq k\$ for some \$l \geq 0\$, where \$ed\$ is the
Levenshtein distance between two input strings.
Question
I am comparing two algorithms: the trivial one, and the one, which incorporates Ukkonen's heuristic for pruning computing the entire distance matrix.
See what I have:
ApproximateStringMatcher.java
DefaultApproximateStringMatcher.java
```
package net.coderodde.string.matching.approximate.support;
import java.util.ArrayList;
import java.util.List;
import static net.coderodde.misc.Miscellanea.delta;
import static net.coderodde.misc.Miscellanea.min;
import net.coderodde.string.matching.approximate.ApproximateStringMatcher;
/**
* This class implements a default approximate string matching algorithm.
*
* @author Rodion "rodde" Efremov
* @version 1.6 (Mar 23, 2016)
*/
public class DefaultApproximateStringMatcher
implements ApproximateStringMatcher {
@Override
public List
Given a text \$T[0..n)\$, a pattern \$P[0..m)\$, and a nonnegative
integer \$k\$, report all positions \$j \in [0..n)\$ such that \$ed(P,
T(j - l..j]) \leq k\$ for some \$l \geq 0\$, where \$ed\$ is the
Levenshtein distance between two input strings.
Question
I am comparing two algorithms: the trivial one, and the one, which incorporates Ukkonen's heuristic for pruning computing the entire distance matrix.
See what I have:
ApproximateStringMatcher.java
package net.coderodde.string.matching.approximate;
import java.util.List;
/**
* This interface defines the API for approximate string matching algorithms.
*
* @author Rodion "rodde" Efremov
* @version 1.6 (Mar 23, 2016)
*/
public interface ApproximateStringMatcher {
/**
* Returns the list of all approximate matches of {@code pattern} in
* {@code text}. The edit distance between an approximate match and the
* pattern is no more than {@code maximumEditDistance}.
*
* @param text the text to search in.
* @param pattern the pattern to search for.
* @param maximumEditDistance the maximum allowed edit distance.
* @return a list of the last indices of all approximate matches.
*/
public List match(String text,
String pattern,
int maximumEditDistance);
}DefaultApproximateStringMatcher.java
```
package net.coderodde.string.matching.approximate.support;
import java.util.ArrayList;
import java.util.List;
import static net.coderodde.misc.Miscellanea.delta;
import static net.coderodde.misc.Miscellanea.min;
import net.coderodde.string.matching.approximate.ApproximateStringMatcher;
/**
* This class implements a default approximate string matching algorithm.
*
* @author Rodion "rodde" Efremov
* @version 1.6 (Mar 23, 2016)
*/
public class DefaultApproximateStringMatcher
implements ApproximateStringMatcher {
@Override
public List
Solution
Overall very nice programming. I am glad that you correctly declared and implemented
In your
I would consider to think about the
The only thing he is doing is to do a ternary, I wonder if I preferred to have that code in place so I didn't have the need to navigate the method to see what it does.
Aside of this what you definitely want to do is to have a variable to store the result so you shorten the
ApproximateStringMatcher.In your
Miscellanea.min method you could have used min = Math.min(min, ints[i]) instead of that if.I would consider to think about the
Miscellanea.delta method. The only thing he is doing is to do a ternary, I wonder if I preferred to have that code in place so I didn't have the need to navigate the method to see what it does.
Aside of this what you definitely want to do is to have a variable to store the result so you shorten the
g[i][j] assignment.public List match(String text, String pattern, int maximumEditDistance) {
//...
for (int j = 1; j <= n; ++j) {
for (int i = 1; i <= top; ++i) {
//consider to do the ternary here instead
int delta = delta(pattern.charAt(i - 1), text.charAt(j - 1));
g[i][j] = min(g[i - 1][j - 1] + delta, g[i - 1][j] + 1, g[i][j - 1] + 1);
}
//...
}
//...
}Code Snippets
public List<Integer> match(String text, String pattern, int maximumEditDistance) {
//...
for (int j = 1; j <= n; ++j) {
for (int i = 1; i <= top; ++i) {
//consider to do the ternary here instead
int delta = delta(pattern.charAt(i - 1), text.charAt(j - 1));
g[i][j] = min(g[i - 1][j - 1] + delta, g[i - 1][j] + 1, g[i][j - 1] + 1);
}
//...
}
//...
}Context
StackExchange Code Review Q#123661, answer score: 2
Revisions (0)
No revisions yet.