HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Algorithm for cyclic $n$-string Hamming distance with constant sized language $\Sigma$

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
constantwithsizeddistancelanguagealgorithmsigmaforcyclicstring

Problem

Suppose we are given a language $\Sigma$ where, suppose, $|\Sigma| = O(1)$. Consider two fixed strings $A, B \in \Sigma^n$. Define the Hamming metric between these strings as
$$d_{H}(A,B) = \sum_{i=1}^n \boldsymbol{1}\lbrace A(i) \neq B(i)\rbrace$$
If we define $B^{(k)}$ as the $k$-shift (to the right) cyclic permutation of $B$, then what I am looking to compute is
$$d_{\text{cyc},H}(A,B) = \min_{k \in \lbrace 0, \cdots, n-1 \rbrace} d_H\left(A, B^{(k)}\right)$$
So it is easy to see that we can compute $d_H(A,B)$ for some length $n$ strings $A$ and $B$ in time $O(n)$, implying a trivial $O(n^2)$ algorithm for $d_{\text{cyc},H}(A,B)$. So my goal is to see if we can do something better. If someone knows of an algorithm that generalizes to any constant value for $|\Sigma|$, I would be happy to know. For now, I will lay out some of my thoughts.

Suppose that $|\Sigma| = 2$, namely that $\Sigma = \lbrace \alpha, \beta \rbrace$. Let us define a map $h: \Sigma \rightarrow \lbrace -1, 1 \rbrace$ where, say, $h(\alpha) = -1$ and $h(\beta) = 1$. If we transform the strings $A$ and $B$ element-wise to strings $A'$ and $B'$ in $\lbrace -1, 1\rbrace^n$, we can then compute all of the $d_H\left(A, B^{(k)}\right)$ values via a FFT of the concatenated string $B'B'$ and $A'$. We can see this by first considering the computation of $d_H(A,B)$. Suppose $I_{=} \subseteq [n]$ is the set of indices for characters where $A$ and $B$ are the same and make $I_{\neq} = [n] \setminus I_{=}$ the set of indices where $A$ and $B$ differ. Clearly $I_{=}$ and $I_{\neq}$ are disjoint, so $|I_{=}| + |I_{\neq}| = n$. Now let us compute the inner product of $A'$ and $B'$. Any element where $A$ and $B$ have the same character, $A'$ and $B'$ will have the same sign at that element. Any element where $A$ and $B$ differ, the signs will differ as well. Thus we find that
$$(A' \cdot B') = \sum_{i=1}^n A'(i) B'(i) = \sum_{i \in I_=} A'(i) B'(i) + \sum_{i \in I_{\neq}} A'(i) B'(i) = |I_=| - |I_{\neq}|$$
As $d

Solution

Let $\alpha \in \Sigma$ and $d_{\alpha, H}(A,B) = n - \sum1\{A(i)=B(i)=\alpha\}$. Then you can use your FFT technique to compute $d_{\alpha, H}(A, B)$ for each $\alpha \in \Sigma$. It will take $O(n \cdot \log(n) \cdot |\Sigma|)$ time. So you will have an $|\Sigma| \times n$ table, where you should find a column with a minimum sum, which can be done in $O(|\Sigma| \cdot n)$ time.

Context

StackExchange Computer Science Q#131994, answer score: 3

Revisions (0)

No revisions yet.