HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

Approximate String Matching Interview Question

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
approximateinterviewquestionstringmatching

Problem

I was working on the challenge Save Humanity from Interviewstreet for a while then gave up, solved a few other challenges, and have come back to it again.

The code below generates the correct answers but the time complexity O(text * pattern) is not sufficient for the auto-grader. Could anyone perhaps give me a hint or a pointer on how to improve my algorithm?

final BufferedReader input = new BufferedReader(new InputStreamReader(System.in));
final StringBuilder builder = new StringBuilder();
final int number_trials = Integer.valueOf(input.readLine());

for(int trial = 0; trial  0 && builder.charAt(builder.length() - 1) == ' ')
    {
        builder.deleteCharAt(builder.length() - 1);
    }           

    if(trial + 1 < number_trials)
    {
        builder.append("\n");
    }       
}

System.out.println(builder.toString());

Solution

1.I would use something like "APPROXIMATE BOYER-MOORE STRING MATCHING" described here:

http://www.cs.hut.fi/~tarhio/papers/abm.pdf

2.The second option is the Suffix Tree Solution:

http://homepage.usask.ca/~ctl271/810/approximate_matching.shtml

The Suffix Tree approach to the k-mismatch problem is essentially the same as the approach taken when matching with wild-cards; though in this case k is the number of allowable mismatches rather than the number of wildcards. The algorithm proceeds by executing up to k constant-time lce queries for each position i in the text. If the the lce queries span the end of P, then P occurs starting at position i of the text.

for i = 1; i < m-n+1; i++
  1. j = 1, i′ = i, count = 0
  2. l = lce (P[j], T[i′])
  3. if j+l = n+1, P occurs in T at i with only count mismatches; continue next i;
  4. if count ≤ k, count++, j = j+l+1, i′=i′+l+1, go to 2
  5. if count = k+1, then a k mismatch of P does not occur at i; continue next i;


An alternate approach, when both k and the alphabet size are relatively small is to generate all permutations of P with up to k changes and then search for each of these permutations in the tree. This would run in time O(P′ + m), where P′ is the combined length of the permissible permutations of P.

Code Snippets

for i = 1; i < m-n+1; i++
  1. j = 1, i′ = i, count = 0
  2. l = lce (P[j], T[i′])
  3. if j+l = n+1, P occurs in T at i with only count mismatches; continue next i;
  4. if count ≤ k, count++, j = j+l+1, i′=i′+l+1, go to 2
  5. if count = k+1, then a k mismatch of P does not occur at i; continue next i;

Context

StackExchange Code Review Q#13383, answer score: 2

Revisions (0)

No revisions yet.