patternMinor
Binary code with constraint
Viewed 0 times
codebinarywithconstraint
Problem
Suppose I have an alphabet of n symbols. I can efficiently encode them with $\lceil \log_2n\rceil$-bits strings. For instance if n=8:
A: 0 0 0
B: 0 0 1
C: 0 1 0
D: 0 1 1
E: 1 0 0
F: 1 0 1
G: 1 1 0
H: 1 1 1
Now I have the additional constraint that each column must contain at most p bits set to 1. For instance for p=2 (and n=8), a possible solution is:
A: 0 0 0 0 0
B: 0 0 0 0 1
C: 0 0 1 0 0
D: 0 0 1 1 0
E: 0 1 0 0 0
F: 0 1 0 1 0
G: 1 0 0 0 0
H: 1 0 0 0 1
Given n and p, does an algorithm exist to find an optimal encoding (shortest length) ? (and can it be proved that it computes an optimal solution?)
EDIT
Two approaches have been proposed so far to estimate a lower bound on the number of bits $m$. The goal of this section is to provide an analysis and a comparaison of the two answers, in order to explain the choice for the best answer.
Yuval's approach is based on entropy and provides a very nice lower bound: $\frac{logn}{h(p/n)}$ where $h(x) = xlogx + (1-x)log(x)$.
Alex's approach is based on combinatorics. If we develop his reasonning a bit more, it is also possible to compute a very good lower bound:
Given $m$ the number of bits $\geq\lceil log_2(n)\rceil$, there exists a unique $k$ such that
$$ 1+\binom{m}{1} + ... +\binom{m}{k} \lt n \leq 1+\binom{m}{1} + ... + \binom{m}{k}+\binom{m}{k+1}$$
One can convince himself that an optimal solution will use the codeword with all bits low, then the codewords with 1 bit high, 2 bits high, ..., k bits high. For the $n-1-\binom{m}{1}-...-\binom{m}{k}$ remaining symbols to encode, it is not clear at all which codewords it is optimal to use but, for sure the weights $w_i$ of each column will be bigger than they would be if we could use only codewords with $k+1$ bits high and have $|w_i - w_j| \leq 1$ for all $i, j$.
Therefore one can lower bound $p=max(w_i)$ with $$p_m = 0 + 1 + \binom{m-1}{2} +... + \binom{m-1}{k-1} + \lceil \frac{(n-1-\binom{m}{1}-...-\binom{m}{k}) (k+1)}{m} \rceil$$
Now, given
A: 0 0 0
B: 0 0 1
C: 0 1 0
D: 0 1 1
E: 1 0 0
F: 1 0 1
G: 1 1 0
H: 1 1 1
Now I have the additional constraint that each column must contain at most p bits set to 1. For instance for p=2 (and n=8), a possible solution is:
A: 0 0 0 0 0
B: 0 0 0 0 1
C: 0 0 1 0 0
D: 0 0 1 1 0
E: 0 1 0 0 0
F: 0 1 0 1 0
G: 1 0 0 0 0
H: 1 0 0 0 1
Given n and p, does an algorithm exist to find an optimal encoding (shortest length) ? (and can it be proved that it computes an optimal solution?)
EDIT
Two approaches have been proposed so far to estimate a lower bound on the number of bits $m$. The goal of this section is to provide an analysis and a comparaison of the two answers, in order to explain the choice for the best answer.
Yuval's approach is based on entropy and provides a very nice lower bound: $\frac{logn}{h(p/n)}$ where $h(x) = xlogx + (1-x)log(x)$.
Alex's approach is based on combinatorics. If we develop his reasonning a bit more, it is also possible to compute a very good lower bound:
Given $m$ the number of bits $\geq\lceil log_2(n)\rceil$, there exists a unique $k$ such that
$$ 1+\binom{m}{1} + ... +\binom{m}{k} \lt n \leq 1+\binom{m}{1} + ... + \binom{m}{k}+\binom{m}{k+1}$$
One can convince himself that an optimal solution will use the codeword with all bits low, then the codewords with 1 bit high, 2 bits high, ..., k bits high. For the $n-1-\binom{m}{1}-...-\binom{m}{k}$ remaining symbols to encode, it is not clear at all which codewords it is optimal to use but, for sure the weights $w_i$ of each column will be bigger than they would be if we could use only codewords with $k+1$ bits high and have $|w_i - w_j| \leq 1$ for all $i, j$.
Therefore one can lower bound $p=max(w_i)$ with $$p_m = 0 + 1 + \binom{m-1}{2} +... + \binom{m-1}{k-1} + \lceil \frac{(n-1-\binom{m}{1}-...-\binom{m}{k}) (k+1)}{m} \rceil$$
Now, given
Solution
Here is a lower bound and an asymptotically matching construction, at least for some ranges of the parameters. Denote by $m$ the number of columns, and suppose for simplicity that $p \leq n/2$.
We start with a lower bound on $m$. Let $X$ be the encoding of symbol chosen uniformly at random. Let $X_1,\ldots,X_m$ be the individual coordinates, and let $w_i \leq p$ be the weight of the $i$th column. Then
$$
\log n = H(X) \leq \sum_{i=1}^m H(X_i) = \sum_{i=1}^m h(w_i/n) \leq m h(p/n).
$$
Therefore
$$
m \geq \frac{\log n}{h(p/n)}.
$$
Here $H$ is the entropy of a random variable $H(X) = -\sum_x \Pr[X=x] \log \Pr[X=x]$ and $h$ is the entropy function $h(x) = -x\log x-(1-x)\log(1-x)$. (You can use whatever base for the logarithm you want.)
The asymptotically matching construction, that should work for $p = \Omega(n)$, chooses $m$ a bit larger than this lower bound, and chooses a random encoding scheme, each bit being set to $1$ with some probability $q/n$ which is a bit smaller than $p/n$. Choosing the parameters correctly, we should get that this results in a legal encoding (all codewords are different and all column weights are at most $p$) with positive probability.
We start with a lower bound on $m$. Let $X$ be the encoding of symbol chosen uniformly at random. Let $X_1,\ldots,X_m$ be the individual coordinates, and let $w_i \leq p$ be the weight of the $i$th column. Then
$$
\log n = H(X) \leq \sum_{i=1}^m H(X_i) = \sum_{i=1}^m h(w_i/n) \leq m h(p/n).
$$
Therefore
$$
m \geq \frac{\log n}{h(p/n)}.
$$
Here $H$ is the entropy of a random variable $H(X) = -\sum_x \Pr[X=x] \log \Pr[X=x]$ and $h$ is the entropy function $h(x) = -x\log x-(1-x)\log(1-x)$. (You can use whatever base for the logarithm you want.)
The asymptotically matching construction, that should work for $p = \Omega(n)$, chooses $m$ a bit larger than this lower bound, and chooses a random encoding scheme, each bit being set to $1$ with some probability $q/n$ which is a bit smaller than $p/n$. Choosing the parameters correctly, we should get that this results in a legal encoding (all codewords are different and all column weights are at most $p$) with positive probability.
Context
StackExchange Computer Science Q#45783, answer score: 4
Revisions (0)
No revisions yet.