patternjavaMinor
indexOf() and Boyer Moore methods
Viewed 0 times
indexofmethodsandboyermoore
Problem
I have a method I created which is a custom
indexOf(). This code seems ugly to me so I was wondering if anyone could recommend a cleaner and perhaps faster implementation of my method.public static int myIndexOf(char[] str, char[] substr, int index) {
if (str == null) {
throw new NullPointerException();
}
if (substr.length > str.length) {
return -1;
}
if (substr.length == 0 && index >= str.length) {
return str.length;
}
if (substr.length == 0 && index >= 0) {
return index;
}
if (substr.length == 0 && index = index) {
return i;
}
}
// can not found substring after index
return -1;
}
// Don't necessarily need any modification for this method. But for those who are interested
Boyer Moore:
public static int[] boyerMoore(char[] str, char[] substr) {
// index array of indexes of found substrings
// initialize
int indexCounter = 0;
int[] subStringIndex = new int[str.length];
for (int c = 0; c = 0; j--) {
if (substr[j] != str[i + j]) {
skip = Math.max(1, j - charSet[str[i + j]]);
break;
}
}
if (skip == 0) {
// increment next value in String
skip++;
// add found substring (beginning index)
subStringIndex[indexCounter] = i;
indexCounter++;
// this last bit is used if Boyer Moore stops searching after first instance is found
// return i;
}
}
return subStringIndex;
// This bit is used when Boyer Moore is stops after the first instance in found
// cannot be found
// return str.length;
}Solution
Naming
-
Choose an existing idiom such as "find a needle in a haystack" and consider putting the parameters in that order.
-
The name
The rest of my answer assumes these changes.
Validating Input
-
You don't need to manually throw
-
Consider how to handle negative values for
Logic Yoga
-
Combine identical prefix expressions in an
This can be simplified and optimized to
It works for
becomes
-
Clarify intent by using
Extract Methods
-
Move generic helper methods such as constraining an index to a library. Apache Commons probably has one. This way someone doesn't have to read the code three times to infer its intended purpose (versus what it actually does). Here
-
Separate parameter checking from the raw search algorithm when either is long enough.
-
Extract a method to test if
Also, you can fail fast in the case of
Dead Code
End Result
Combining all of the above gives us a much-simplified method.
-
Choose an existing idiom such as "find a needle in a haystack" and consider putting the parameters in that order.
-
The name
index implies some generic location within haystack. However, start makes it clear why that index is significant.The rest of my answer assumes these changes.
Validating Input
-
You don't need to manually throw
NullPointerException for null values when accessing the length of each string will do it automatically. If not, make sure to check both strings for consistency.-
Consider how to handle negative values for
start, typically by taking it from the end of the haystack. You'll need to decide whether to use modulo arithmetic or constrain overly-negative values to zero.if start < 0
start = Math.max(0, haystack.length + start)Logic Yoga
-
Combine identical prefix expressions in an
if chain.if (A && R) { X }
if (A && S) { Y }
if (A && T) { Z }This can be simplified and optimized to
if (A) {
if (R) { X }
if (S) { Y }
if (T) { Z }
}It works for
|| as well, though I can't think of a recent sighting in the wild.if (A || R) { X }
if (A || S) { Y }
if (A || T) { Z }becomes
if (A)
X Y Z
else
// same as "if (A)" above-
Clarify intent by using
else if after blocks that should exit. Adding else lets the compiler enforce that intent when they're modified.myIndexOf(needle, haystack, start)
if needle is larger than haystack
return ...
else if needle is empty
return ...
else
return ...Extract Methods
-
Move generic helper methods such as constraining an index to a library. Apache Commons probably has one. This way someone doesn't have to read the code three times to infer its intended purpose (versus what it actually does). Here
index makes more sense without a context.constrain(index, size)
return Math.max(0, Math.min(index, size))-
Separate parameter checking from the raw search algorithm when either is long enough.
findBoyerMore(needle, haystack, start)
for index in boyerMoore(needle, haystack)
if index >= start
return index
return -1-
Extract a method to test if
needle can fit inside haystack at start.Also, you can fail fast in the case of
myIndexOf("foo", "foobar", 5) without calling boyerMoore even though foo is shorter than foobar. Take start into consideration when comparing the length of the two strings.canFitAt(needle, haystack, start)
return start + needle.length <= haystack.lengthDead Code
- Since
index >= 0and `index
End Result
Combining all of the above gives us a much-simplified method.
myIndexOf(needle, haystack, start)
if canFitAt(needle, haystack, start)
return findBoyerMoore(needle, haystack, start)
else
return constrain(start, haystack.length)Code Snippets
if start < 0
start = Math.max(0, haystack.length + start)if (A && R) { X }
if (A && S) { Y }
if (A && T) { Z }if (A) {
if (R) { X }
if (S) { Y }
if (T) { Z }
}if (A || R) { X }
if (A || S) { Y }
if (A || T) { Z }if (A)
X Y Z
else
// same as "if (A)" aboveContext
StackExchange Code Review Q#45362, answer score: 5
Revisions (0)
No revisions yet.