HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

Karp-Rabin String Matching, Part 2

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
rabinpartstringmatchingkarp

Problem

Part 1 is here. I wrote a Java implementation of the Karp-Rabin semi-numerical string matching algorithm. I incorporated the suggestions of @fge and @maaartinus from the other question into my new code: the match method now returns a boolean so you can write if (KarpRabin.match(pattern, text)), and I changed the underlying algorithm to use arithmetic mod \$2^{32}\$ with an arbitrary base, so that overflow isn't a problem and any character supported by Java's Strings (which are UTF-16) can be understood.

I also added another method, allMatches, which finds all positions in the string which match the given pattern and returns them in a list. It wasn't obvious to me how to split up allMatches and match so I could reuse the code, so most of it is repeated; that's one major issue I'd like suggestions for.

The current implementation of allMatches is not consistently typed because it returns Collections.EMPTY_LIST (with no generic type parameter) when no matches are found, but a List when matches are found. I did this because I like using it as in this code:

List result = KarpRabin.allMatches(pattern, text);
if (result == Collections.EMPTY_LIST) {
    // do whatever you do when nothing matches
}


better than as in this code:

List result = KarpRabin.allMatches(pattern, text);
if (result.size() == 0) {
     // do whatever you do when nothing matches
}


But this breaks generic type consistency and generates a compiler warning, so I'm open to other ways of doing it.

Here's the code:

```
public class KarpRabin {
private static final int BASE = 103; // Arbitrary base for hash.

/**
Finds first match, returns true, false if no match found.
*/
public static boolean match(String pattern, String text) {
if (pattern.length() > text.length()) {
return false;
}
int out, phash, thash;
out = 1;

for (int expt = pattern.length(); expt > 1; --expt) {
out *= BASE;

Solution

Your concerns about:

List result = KarpRabin.allMatches(pattern, text);
if (result == Collections.EMPTY_LIST) {
    // do whatever you do when nothing matches
}


are real. That's a problem. On the other hand, there's the standard isEmpty() helper method:

if (result.isEmpty()) {
    // do whatever you do when nothing matches
}


In this code loop here:

for (int expt = pattern.length(); expt > 1; --expt) {
        out *= BASE;
    }


it is not obvious why you loop 1-less time than you have characters in the code. Additionally, I am uncertain why the backward-direction loop is useful. Should be:

for (int i = 1; i < pattern.length(); i++) {
        out *= BASE;
    }


As I say, though, the loop above does 1-less than the number of chars in the pattern, should it be indexed from 0? If not, it should be commented.

As for the duplication of code, you should make the class have a private constructor, and you should create an instance of the class each time you call one of the static 'search' methods.

When you create the instance, you create the various state information (the phash, the thash, etc.). Then, have a next() method that just gets the location of the next match. Your two static methods would then look like:

public static boolean match(String pattern, String text) {
    KarpRabin kr = new KarpRobin(pattern, text);
    return kr.next() >= 0;
}

public static List allMatches(String pattern, String text) {
    KarpRabin kr = new KarpRobin(pattern, text);
    List matches = new ArrayList();
    int pos = 0;
    while ((pos = kr.next()) >= 0) {
        matches.add(pos);
    }
    return matches;
}


Then, what you need to do is make the actual KarpRobin class more 'stateful', it knows where it is in the search, and it finds the next match each time. The code should be fairly easy to copy/paste from your current code.

Code Snippets

List<Integer> result = KarpRabin.allMatches(pattern, text);
if (result == Collections.EMPTY_LIST) {
    // do whatever you do when nothing matches
}
if (result.isEmpty()) {
    // do whatever you do when nothing matches
}
for (int expt = pattern.length(); expt > 1; --expt) {
        out *= BASE;
    }
for (int i = 1; i < pattern.length(); i++) {
        out *= BASE;
    }
public static boolean match(String pattern, String text) {
    KarpRabin kr = new KarpRobin(pattern, text);
    return kr.next() >= 0;
}

public static List<Integer> allMatches(String pattern, String text) {
    KarpRabin kr = new KarpRobin(pattern, text);
    List<Integer> matches = new ArrayList<Integer>();
    int pos = 0;
    while ((pos = kr.next()) >= 0) {
        matches.add(pos);
    }
    return matches;
}

Context

StackExchange Code Review Q#70066, answer score: 4

Revisions (0)

No revisions yet.