patternjavaMinor
Approximate String Matching Interview Question
Viewed 0 times
approximateinterviewquestionstringmatching
Problem
I was working on the challenge Save Humanity from Interviewstreet for a while then gave up, solved a few other challenges, and have come back to it again.
The code below generates the correct answers but the time complexity
The code below generates the correct answers but the time complexity
O(text * pattern) is not sufficient for the auto-grader. Could anyone perhaps give me a hint or a pointer on how to improve my algorithm? final BufferedReader input = new BufferedReader(new InputStreamReader(System.in));
final StringBuilder builder = new StringBuilder();
final int number_trials = Integer.valueOf(input.readLine());
for(int trial = 0; trial 0 && builder.charAt(builder.length() - 1) == ' ')
{
builder.deleteCharAt(builder.length() - 1);
}
if(trial + 1 < number_trials)
{
builder.append("\n");
}
}
System.out.println(builder.toString());Solution
1.I would use something like "APPROXIMATE BOYER-MOORE STRING MATCHING" described here:
http://www.cs.hut.fi/~tarhio/papers/abm.pdf
2.The second option is the Suffix Tree Solution:
http://homepage.usask.ca/~ctl271/810/approximate_matching.shtml
The Suffix Tree approach to the k-mismatch problem is essentially the same as the approach taken when matching with wild-cards; though in this case k is the number of allowable mismatches rather than the number of wildcards. The algorithm proceeds by executing up to k constant-time lce queries for each position i in the text. If the the lce queries span the end of P, then P occurs starting at position i of the text.
An alternate approach, when both k and the alphabet size are relatively small is to generate all permutations of P with up to k changes and then search for each of these permutations in the tree. This would run in time O(P′ + m), where P′ is the combined length of the permissible permutations of P.
http://www.cs.hut.fi/~tarhio/papers/abm.pdf
2.The second option is the Suffix Tree Solution:
http://homepage.usask.ca/~ctl271/810/approximate_matching.shtml
The Suffix Tree approach to the k-mismatch problem is essentially the same as the approach taken when matching with wild-cards; though in this case k is the number of allowable mismatches rather than the number of wildcards. The algorithm proceeds by executing up to k constant-time lce queries for each position i in the text. If the the lce queries span the end of P, then P occurs starting at position i of the text.
for i = 1; i < m-n+1; i++
1. j = 1, i′ = i, count = 0
2. l = lce (P[j], T[i′])
3. if j+l = n+1, P occurs in T at i with only count mismatches; continue next i;
4. if count ≤ k, count++, j = j+l+1, i′=i′+l+1, go to 2
5. if count = k+1, then a k mismatch of P does not occur at i; continue next i;An alternate approach, when both k and the alphabet size are relatively small is to generate all permutations of P with up to k changes and then search for each of these permutations in the tree. This would run in time O(P′ + m), where P′ is the combined length of the permissible permutations of P.
Code Snippets
for i = 1; i < m-n+1; i++
1. j = 1, i′ = i, count = 0
2. l = lce (P[j], T[i′])
3. if j+l = n+1, P occurs in T at i with only count mismatches; continue next i;
4. if count ≤ k, count++, j = j+l+1, i′=i′+l+1, go to 2
5. if count = k+1, then a k mismatch of P does not occur at i; continue next i;Context
StackExchange Code Review Q#13383, answer score: 2
Revisions (0)
No revisions yet.