HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

indexOf() and Boyer Moore methods

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
indexofmethodsandboyermoore

Problem

I have a method I created which is a custom indexOf(). This code seems ugly to me so I was wondering if anyone could recommend a cleaner and perhaps faster implementation of my method.

public static int myIndexOf(char[] str, char[] substr, int index) {
    if (str == null) {
        throw new NullPointerException();
    }
    if (substr.length > str.length) {
        return -1;
    }
    if (substr.length == 0 && index >= str.length) {
        return str.length;
    }
    if (substr.length == 0 && index >= 0) {
        return index;
    }
    if (substr.length == 0 && index = index) {
            return i;
        }
    }
    // can not found substring after index
    return -1;
}

// Don't necessarily need any modification for this method. But for those who are interested
Boyer Moore:

public static int[] boyerMoore(char[] str, char[] substr) { 
    // index array of indexes of found substrings
    // initialize
    int indexCounter = 0;
    int[] subStringIndex = new int[str.length];
    for (int c = 0; c = 0; j--) {
            if (substr[j] != str[i + j]) {
                skip = Math.max(1, j - charSet[str[i + j]]);
                    break;
            }
        }

        if (skip == 0) {
            // increment next value in String
            skip++;
            // add found substring (beginning index)
            subStringIndex[indexCounter] = i;
            indexCounter++;
            // this last bit is used if Boyer Moore stops searching after first instance is found
            // return i;

        }
    }

    return  subStringIndex;
    // This bit is used when Boyer Moore is stops after the first instance in found
    // cannot be found
    // return str.length;
}

Solution

Naming

-
Choose an existing idiom such as "find a needle in a haystack" and consider putting the parameters in that order.

-
The name index implies some generic location within haystack. However, start makes it clear why that index is significant.

The rest of my answer assumes these changes.

Validating Input

-
You don't need to manually throw NullPointerException for null values when accessing the length of each string will do it automatically. If not, make sure to check both strings for consistency.

-
Consider how to handle negative values for start, typically by taking it from the end of the haystack. You'll need to decide whether to use modulo arithmetic or constrain overly-negative values to zero.

if start < 0
    start = Math.max(0, haystack.length + start)


Logic Yoga

-
Combine identical prefix expressions in an if chain.

if (A && R) { X }
if (A && S) { Y }
if (A && T) { Z }


This can be simplified and optimized to

if (A) {
    if (R) { X }
    if (S) { Y }
    if (T) { Z }
}


It works for || as well, though I can't think of a recent sighting in the wild.

if (A || R) { X }
if (A || S) { Y }
if (A || T) { Z }


becomes

if (A)
    X Y Z
else
    // same as "if (A)" above


-
Clarify intent by using else if after blocks that should exit. Adding else lets the compiler enforce that intent when they're modified.

myIndexOf(needle, haystack, start)
    if needle is larger than haystack
        return ...
    else if needle is empty
        return ...
    else
        return ...


Extract Methods

-
Move generic helper methods such as constraining an index to a library. Apache Commons probably has one. This way someone doesn't have to read the code three times to infer its intended purpose (versus what it actually does). Here index makes more sense without a context.

constrain(index, size)
    return Math.max(0, Math.min(index, size))


-
Separate parameter checking from the raw search algorithm when either is long enough.

findBoyerMore(needle, haystack, start)
    for index in boyerMoore(needle, haystack)
        if index >= start
            return index
    return -1


-
Extract a method to test if needle can fit inside haystack at start.

Also, you can fail fast in the case of myIndexOf("foo", "foobar", 5) without calling boyerMoore even though foo is shorter than foobar. Take start into consideration when comparing the length of the two strings.

canFitAt(needle, haystack, start)
    return start + needle.length <= haystack.length


Dead Code

  • Since index >= 0 and `index



End Result

Combining all of the above gives us a much-simplified method.

myIndexOf(needle, haystack, start)
    if canFitAt(needle, haystack, start)
        return findBoyerMoore(needle, haystack, start)
    else
        return constrain(start, haystack.length)

Code Snippets

if start < 0
    start = Math.max(0, haystack.length + start)
if (A && R) { X }
if (A && S) { Y }
if (A && T) { Z }
if (A) {
    if (R) { X }
    if (S) { Y }
    if (T) { Z }
}
if (A || R) { X }
if (A || S) { Y }
if (A || T) { Z }
if (A)
    X Y Z
else
    // same as "if (A)" above

Context

StackExchange Code Review Q#45362, answer score: 5

Revisions (0)

No revisions yet.