HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

String Matching w/o repeating the comparisons for common part of the two strings

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
thepartrepeatingtwocommonforstringsstringmatchingcomparisons

Problem

In an attempt to think up a more efficient String Matching algorithm than the naïve \$O(n \cdot k)\$ one I came up with a slightly modified approach.

In my algorithm (code in Java), I tried a little modification of the \$O(n \cdot k)\$ approach where every time a character match starting from i fails after k positions, we start from i+1th place.

In this algo however, if matching fails at k position, I resume comparison from k-1 because we know up to k-1 the characters are common.

Below is the implementation of this approach.

I will really appreciate anyone taking a look into the code and

  • Confirm if things work the way I explained. My tests seem to be working fine.



  • What is the complexity of this approach? Is it worst case \$O(n \cdot k)\$ still?



  • An example of a worst case input.



Code:

public class ModifiedSearch {
    private String text;

    public ModifiedSearch(String text) {
        this.text = text;
    }

    public int findPattern(String pattern) {
        int textLength = text.length();
        int patLength = pattern.length();
        int common = 0;
        boolean broken = false;

        for (int tIndex = 0; tIndex <= textLength - patLength; broken = false, tIndex++) {
            int k = 0;

            while (tIndex + common + k < textLength && common + k < patLength) {

                if (text.charAt(tIndex + common + k) != pattern.charAt(common + k)) {
                    broken = true;
                    common = (common + k - 1 < 0) ? 0 : common + k - 1 ;
                    break;
                }

                k++;
            }

            if (!broken && common + k == patLength) {
                return tIndex;
            }
        }

        return -1;
    }
}


I have tried this with a combination of inputs large and small, but this does not seem to worse out to \$O(n \cdot k)\$. Any help greatly appreciated.

Solution

Confirm if things work the way I explained. My tests seem to be working fine.

I'm afraid not. For example, for text hehehello and pattern hehello, the output is 0 instead of 2. You see, the algorithm prematurely exits after heheh, when it should continue searching from position 2.

With your modified version this test case passes, but fails for text heheehello and pattern hehello, incorrectly returning 3 instead of -1.
(In the future, please do not modify the code in the question after you received answers. Changing the code typically invalidates answers, as it did in this example too.)

See the Knuth-Morris-Pratt algorithm for something that actually works.


What is the complexity of this approach? Is it worst case O(n⋅k) still?

The complexity of your algorithm is \$O(n + k)\$, same as that of KMP. Unfortunately it doesn't really matter, because it doesn't work correctly.

Usage

The usage of your class is a bit awkward:

int index = new ModifiedSearch(text).findPattern(pattern);


I recommend to rewrite to make it work this way:

int index = ModifiedSearch.findPattern(text, pattern);


That is, turn the class into a static utility method.
It will be lighter that you don't have to create an instance of the class.

Looking at the function name "findPattern" alone, it's not obvious what it returns. Since the function returns an index, like String.indexOf, I recommend to rename it to indexOf. Or, rename to contains and make it return a boolean.

Code Snippets

int index = new ModifiedSearch(text).findPattern(pattern);
int index = ModifiedSearch.findPattern(text, pattern);

Context

StackExchange Code Review Q#125825, answer score: 2

Revisions (0)

No revisions yet.