HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Micro-optimisation for edit distance computation: is it valid?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
microeditcomputationdistanceoptimisationforvalid

Problem

On Wikipedia, an implementation for the bottom-up dynamic programming scheme for the edit distance is given. It does not follow the definition completely; inner cells are computed thus:

if s[i] = t[j] then  
  d[i, j] := d[i-1, j-1]       // no operation required
else
  d[i, j] := minimum
             (
               d[i-1, j] + 1,  // a deletion
               d[i, j-1] + 1,  // an insertion
               d[i-1, j-1] + 1 // a substitution
             )
}


As you can see, the algorithm always chooses the value from the upper-left neighbour if there is a match, saving some memory accesses, ALU operations and comparisons.

However, deletion (or insertion) may result in a smaller value, thus the algorithm is locally incorrect, i.e. it breaks with the optimality criterion. But maybe the mistake does not change the end result -- it might be cancelled out.

Is this micro-optimisation valid, and why (not)?

Solution

I don't think that the algorithm is flawed. If two strings are matched, we compare first its last two characters (and then recurse). If they are the same, we can match them to get an optimal alignment. For example, consider the strings test and testat.
If you don't match the two last ts, than one of the ts remains unmatched, since otherwise your matching would look like this:

This is impossible, since the arrows are not allowed to "cross". The matched t induces several inserts (green boxes in the figure), as depicted on the left:

But then you can simply find an equally good alignment, depicted on the right. In both cases you match a t and you have two inserts.

The argument for a substitution of one of the last ts is the same. So if you substitute one of the last ts, then you can instead match the last two t, and get a better alignment (see the picture).

Context

StackExchange Computer Science Q#2985, answer score: 6

Revisions (0)

No revisions yet.