HiveBrain v1.2.0
Get Started
← Back to all entries
snippetMinor

How can you compute the expected edit distance in $O(2^{3n/2})$ time?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
editcantheyoucomputeexpecteddistancetimehow

Problem

In a coding challenge an answer claimed to be able to compute the expected edit distance between two binary strings of length $n$ in $O(2^{3n/2})$ edit distance calculations by dynamic programming. A naive solution would requite $O(2^{2n})$ edit distance calculations.

The answer gives optimized code but no mathematical or algorithmic explanation. I cannot see how to achieve the claimed complexity. How is this possible?

Solution

Heh…for the record, all I said was that the time seems to scale as approximately $\tilde O(2^{1.5n})$. If this was anything more than a conjecture based on the timing data that I included in the post, I would have said something more specific!

I’m sure you’re familiar with the usual Wagner–Fischer algorithm for Levenshtein distance, where we compute $d_{i,j} = d(a_{[0, i)}, b_{[0, j)})$ based on $a_{i-1}, b_{j-1}, d_{i-1,j-1}, d_{i-1,j}, d_{i,j-1}$. From there, my algorithm is based on two observations.

Firstly, each column $d_{,j}$ is a function of the previous column $d_{,j-1}$, the string $a$, and the bit $b_{j-1}$. We iterate over all strings $a$ (modulo symmetries), but instead of iterating over all strings $b$, we map each possible column $d_{,j-1}$ to the two possible columns $d_{,j}$, storing the collection of possible columns in a hash map with a counter for duplicates. This allows us to deduplicate some work when many different prefixes $b_{[0, j)}$ lead to the same column $d_{*,j}$.

Secondly, using a separate dynamic programming algorithm, we can precompute an upper bound $w_{i,j}$ on $d(a_{[i, n)}, b_{[j, n)})$ based only on $a$. Then, given a column $d_{*,j}$, the final distance $d_{n,n}$ can never be more than $m = \min_i (d_{i,j} + w_{i,j})$, so we can replace each $d_{i,j}$ with $\min \{d_{i,j}, m - |i - j|\}$ without affecting the final result. This lets us recognize more columns as duplicates and deduplicate more work.

Context

StackExchange Computer Science Q#119949, answer score: 5

Revisions (0)

No revisions yet.