patternMinor
Distribution of the number of bits changed during generation of all binary tuples
Viewed 0 times
thenumberallduringbitstuplesgenerationbinarydistributionchanged
Problem
Knuth's TAOCP chapter 7.2.1 discusses generating all the $n$-bit binary strings.
I am interested in the total number of bits that change during this process.
During generation we store an $n$-bit string and can change some of the bits of the string in each round, with the string output at the end of each round.
Call the sequence of integers represented by the output strings the generating sequence.
Every $n$-bit string must be generated precisely once in a generating sequence.
We are interested in the sum of the Hamming distances between successive elements in the sequence, with the initial value assigned to the string regarded as an $n$ bit change.
For the ascending generating sequence $0,1,\dots,2^n-1$ the number of bits changed during the enumeration is then $2^{n+1} - 2$.
The generating sequence $0,2^n-1,1,2^n-2,2,2^n-3,\dots,2^{n-1}-1,2^{n-1}$ yields a larger number of bit changes, at least $(n+1)2^{n-1} = 2^{n-1+\log n}$, and $2^{n-1+\log n} \ge 2^{n+1} > 2^{n+1} - 2$ as long as $n\ge 4$.
A Gray code (which changes one bit at each step) yields $n+2^n-1$ bit changes.
We can normalise these by dividing by $2^n$ to obtain the mean number of bits changed per step.
What is the distribution of the mean number of bit changes per step, for a generating sequence chosen uniformly at random?
Even the expected value would be useful, or its behaviour as $n$ grows.
The mean number of bit changes per step tends to 2 for the ascending sequence and tends to 1 for Gray codes (which achieve the minimum), as $n$ increases.
However, I'm not even sure what the maximum is, other than being no larger than $n$ bits per step.
I am interested in the total number of bits that change during this process.
During generation we store an $n$-bit string and can change some of the bits of the string in each round, with the string output at the end of each round.
Call the sequence of integers represented by the output strings the generating sequence.
Every $n$-bit string must be generated precisely once in a generating sequence.
We are interested in the sum of the Hamming distances between successive elements in the sequence, with the initial value assigned to the string regarded as an $n$ bit change.
For the ascending generating sequence $0,1,\dots,2^n-1$ the number of bits changed during the enumeration is then $2^{n+1} - 2$.
The generating sequence $0,2^n-1,1,2^n-2,2,2^n-3,\dots,2^{n-1}-1,2^{n-1}$ yields a larger number of bit changes, at least $(n+1)2^{n-1} = 2^{n-1+\log n}$, and $2^{n-1+\log n} \ge 2^{n+1} > 2^{n+1} - 2$ as long as $n\ge 4$.
A Gray code (which changes one bit at each step) yields $n+2^n-1$ bit changes.
We can normalise these by dividing by $2^n$ to obtain the mean number of bits changed per step.
What is the distribution of the mean number of bit changes per step, for a generating sequence chosen uniformly at random?
Even the expected value would be useful, or its behaviour as $n$ grows.
The mean number of bit changes per step tends to 2 for the ascending sequence and tends to 1 for Gray codes (which achieve the minimum), as $n$ increases.
However, I'm not even sure what the maximum is, other than being no larger than $n$ bits per step.
Solution
It is somewhat more natural to normalize the number of bit changes per step by the number of steps, namely $2^n-1$. We can then rephrase your question as follows. Let $\pi$ be a random permutation of $\{0,1\}^n$. You are interested in the distribution of
$$
X = \frac{1}{2^n-1} \sum_{i=0}^{2^n-2} d(\pi(i),\pi(i+1)),
$$
where $d$ is Hamming distance. We can compute the moments of $X$ using linearity of expectation. The expected value of $X$ is
$$
\mathbb{E}[X] = \mathbb{E} d(\pi(0),\pi(1)) = \mathbb{E}_{x \neq 0} d(0^n,x) = \frac{\frac{n}{2} 2^n}{2^n-1} \approx \frac{n}{2}.
$$
The second moment of $X$ is
$$
\mathbb{E}[X^2] = \frac{1}{2^n-1} \mathbb{E} d(\pi(0),\pi(1))^2 + \frac{2(2^n-2)}{(2^n-1)^2} \mathbb{E} d(\pi(0),\pi(1)) d(\pi(1),\pi(2)) + \frac{(2^n-2)(2^n-3)}{(2^n-1)^2} \mathbb{E} d(\pi(0),\pi(1)) d(\pi(2),\pi(3)) \approx
\mathbb{E} d(\pi(0),\pi(1)) d(\pi(2),\pi(3)) \approx \left(\frac{n}{2}\right)^2.
$$
More generally, for every fixed $t$ we have
$$
\mathbb{E}[X^t] \approx \left(\frac{n}{2}\right)^t.
$$
This suggests that $X$ is tightly concentrated around $n/2$.
Going further, one is tempted to conjecture that the distribution is roughly normal, since the dependence between non-adjacent summands is very slight. This can be verified or refuted by repeating the calculation above with $d(\pi(i),\pi(i+1))$ replaced by $d'(\pi(i),\pi(i+1)) = d(\pi(i),\pi(i+1)) - n/2$. Since $d(\pi(i),\pi(i+1))$ is distributed almost like $\mathrm{Bin}(n,1/2)$, this suggests that $X$ is distributed almost like $N(n/2,n/4)$, a normal random variable with mean $n/2$ and variance $n/4$.
$$
X = \frac{1}{2^n-1} \sum_{i=0}^{2^n-2} d(\pi(i),\pi(i+1)),
$$
where $d$ is Hamming distance. We can compute the moments of $X$ using linearity of expectation. The expected value of $X$ is
$$
\mathbb{E}[X] = \mathbb{E} d(\pi(0),\pi(1)) = \mathbb{E}_{x \neq 0} d(0^n,x) = \frac{\frac{n}{2} 2^n}{2^n-1} \approx \frac{n}{2}.
$$
The second moment of $X$ is
$$
\mathbb{E}[X^2] = \frac{1}{2^n-1} \mathbb{E} d(\pi(0),\pi(1))^2 + \frac{2(2^n-2)}{(2^n-1)^2} \mathbb{E} d(\pi(0),\pi(1)) d(\pi(1),\pi(2)) + \frac{(2^n-2)(2^n-3)}{(2^n-1)^2} \mathbb{E} d(\pi(0),\pi(1)) d(\pi(2),\pi(3)) \approx
\mathbb{E} d(\pi(0),\pi(1)) d(\pi(2),\pi(3)) \approx \left(\frac{n}{2}\right)^2.
$$
More generally, for every fixed $t$ we have
$$
\mathbb{E}[X^t] \approx \left(\frac{n}{2}\right)^t.
$$
This suggests that $X$ is tightly concentrated around $n/2$.
Going further, one is tempted to conjecture that the distribution is roughly normal, since the dependence between non-adjacent summands is very slight. This can be verified or refuted by repeating the calculation above with $d(\pi(i),\pi(i+1))$ replaced by $d'(\pi(i),\pi(i+1)) = d(\pi(i),\pi(i+1)) - n/2$. Since $d(\pi(i),\pi(i+1))$ is distributed almost like $\mathrm{Bin}(n,1/2)$, this suggests that $X$ is distributed almost like $N(n/2,n/4)$, a normal random variable with mean $n/2$ and variance $n/4$.
Context
StackExchange Computer Science Q#47151, answer score: 4
Revisions (0)
No revisions yet.