snippetMinor
How can you compute the expected edit distance in $O(2^{3n/2})$ time?
Viewed 0 times
editcantheyoucomputeexpecteddistancetimehow
Problem
In a coding challenge an answer claimed to be able to compute the expected edit distance between two binary strings of length $n$ in $O(2^{3n/2})$ edit distance calculations by dynamic programming. A naive solution would requite $O(2^{2n})$ edit distance calculations.
The answer gives optimized code but no mathematical or algorithmic explanation. I cannot see how to achieve the claimed complexity. How is this possible?
The answer gives optimized code but no mathematical or algorithmic explanation. I cannot see how to achieve the claimed complexity. How is this possible?
Solution
Heh…for the record, all I said was that the time seems to scale as approximately $\tilde O(2^{1.5n})$. If this was anything more than a conjecture based on the timing data that I included in the post, I would have said something more specific!
I’m sure you’re familiar with the usual Wagner–Fischer algorithm for Levenshtein distance, where we compute $d_{i,j} = d(a_{[0, i)}, b_{[0, j)})$ based on $a_{i-1}, b_{j-1}, d_{i-1,j-1}, d_{i-1,j}, d_{i,j-1}$. From there, my algorithm is based on two observations.
Firstly, each column $d_{,j}$ is a function of the previous column $d_{,j-1}$, the string $a$, and the bit $b_{j-1}$. We iterate over all strings $a$ (modulo symmetries), but instead of iterating over all strings $b$, we map each possible column $d_{,j-1}$ to the two possible columns $d_{,j}$, storing the collection of possible columns in a hash map with a counter for duplicates. This allows us to deduplicate some work when many different prefixes $b_{[0, j)}$ lead to the same column $d_{*,j}$.
Secondly, using a separate dynamic programming algorithm, we can precompute an upper bound $w_{i,j}$ on $d(a_{[i, n)}, b_{[j, n)})$ based only on $a$. Then, given a column $d_{*,j}$, the final distance $d_{n,n}$ can never be more than $m = \min_i (d_{i,j} + w_{i,j})$, so we can replace each $d_{i,j}$ with $\min \{d_{i,j}, m - |i - j|\}$ without affecting the final result. This lets us recognize more columns as duplicates and deduplicate more work.
I’m sure you’re familiar with the usual Wagner–Fischer algorithm for Levenshtein distance, where we compute $d_{i,j} = d(a_{[0, i)}, b_{[0, j)})$ based on $a_{i-1}, b_{j-1}, d_{i-1,j-1}, d_{i-1,j}, d_{i,j-1}$. From there, my algorithm is based on two observations.
Firstly, each column $d_{,j}$ is a function of the previous column $d_{,j-1}$, the string $a$, and the bit $b_{j-1}$. We iterate over all strings $a$ (modulo symmetries), but instead of iterating over all strings $b$, we map each possible column $d_{,j-1}$ to the two possible columns $d_{,j}$, storing the collection of possible columns in a hash map with a counter for duplicates. This allows us to deduplicate some work when many different prefixes $b_{[0, j)}$ lead to the same column $d_{*,j}$.
Secondly, using a separate dynamic programming algorithm, we can precompute an upper bound $w_{i,j}$ on $d(a_{[i, n)}, b_{[j, n)})$ based only on $a$. Then, given a column $d_{*,j}$, the final distance $d_{n,n}$ can never be more than $m = \min_i (d_{i,j} + w_{i,j})$, so we can replace each $d_{i,j}$ with $\min \{d_{i,j}, m - |i - j|\}$ without affecting the final result. This lets us recognize more columns as duplicates and deduplicate more work.
Context
StackExchange Computer Science Q#119949, answer score: 5
Revisions (0)
No revisions yet.