HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

Comparing two approximate string matching algorithms in Java

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
approximatecomparingjavaalgorithmstwostringmatching

Problem

Problem description

Given a text \$T[0..n)\$, a pattern \$P[0..m)\$, and a nonnegative
integer \$k\$, report all positions \$j \in [0..n)\$ such that \$ed(P,
T(j - l..j]) \leq k\$ for some \$l \geq 0\$, where \$ed\$ is the
Levenshtein distance between two input strings.

Question

I am comparing two algorithms: the trivial one, and the one, which incorporates Ukkonen's heuristic for pruning computing the entire distance matrix.

See what I have:

ApproximateStringMatcher.java

package net.coderodde.string.matching.approximate;

import java.util.List;

/**
 * This interface defines the API for approximate string matching algorithms.
 * 
 * @author Rodion "rodde" Efremov
 * @version 1.6 (Mar 23, 2016)
 */
public interface ApproximateStringMatcher {

    /**
     * Returns the list of all approximate matches of {@code pattern} in 
     * {@code text}. The edit distance between an approximate match and the 
     * pattern is no more than {@code maximumEditDistance}.
     * 
     * @param text                the text to search in.
     * @param pattern             the pattern to search for.
     * @param maximumEditDistance the maximum allowed edit distance.
     * @return a list of the last indices of all approximate matches.
     */
    public List match(String text, 
                               String pattern, 
                               int maximumEditDistance);
}


DefaultApproximateStringMatcher.java

```
package net.coderodde.string.matching.approximate.support;

import java.util.ArrayList;
import java.util.List;
import static net.coderodde.misc.Miscellanea.delta;
import static net.coderodde.misc.Miscellanea.min;
import net.coderodde.string.matching.approximate.ApproximateStringMatcher;

/**
* This class implements a default approximate string matching algorithm.
*
* @author Rodion "rodde" Efremov
* @version 1.6 (Mar 23, 2016)
*/
public class DefaultApproximateStringMatcher
implements ApproximateStringMatcher {

@Override
public List

Solution

Overall very nice programming. I am glad that you correctly declared and implemented ApproximateStringMatcher.

In your Miscellanea.min method you could have used min = Math.min(min, ints[i]) instead of that if.

I would consider to think about the Miscellanea.delta method.
The only thing he is doing is to do a ternary, I wonder if I preferred to have that code in place so I didn't have the need to navigate the method to see what it does.
Aside of this what you definitely want to do is to have a variable to store the result so you shorten the g[i][j] assignment.

public List match(String text, String pattern, int maximumEditDistance) {
   //...
    for (int j = 1; j <= n; ++j) {
        for (int i = 1; i <= top; ++i) {
            //consider to do the ternary here instead
            int delta = delta(pattern.charAt(i - 1), text.charAt(j - 1)); 
            g[i][j] = min(g[i - 1][j - 1] + delta, g[i - 1][j] + 1, g[i][j - 1] + 1);
        }
        //...
    }
    //...
}

Code Snippets

public List<Integer> match(String text, String pattern, int maximumEditDistance) {
   //...
    for (int j = 1; j <= n; ++j) {
        for (int i = 1; i <= top; ++i) {
            //consider to do the ternary here instead
            int delta = delta(pattern.charAt(i - 1), text.charAt(j - 1)); 
            g[i][j] = min(g[i - 1][j - 1] + delta, g[i - 1][j] + 1, g[i][j - 1] + 1);
        }
        //...
    }
    //...
}

Context

StackExchange Code Review Q#123661, answer score: 2

Revisions (0)

No revisions yet.