HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Binary code with constraint

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
codebinarywithconstraint

Problem

Suppose I have an alphabet of n symbols. I can efficiently encode them with $\lceil \log_2n\rceil$-bits strings. For instance if n=8:

A: 0 0 0

B: 0 0 1

C: 0 1 0

D: 0 1 1

E: 1 0 0

F: 1 0 1

G: 1 1 0

H: 1 1 1

Now I have the additional constraint that each column must contain at most p bits set to 1. For instance for p=2 (and n=8), a possible solution is:

A: 0 0 0 0 0

B: 0 0 0 0 1

C: 0 0 1 0 0

D: 0 0 1 1 0

E: 0 1 0 0 0

F: 0 1 0 1 0

G: 1 0 0 0 0

H: 1 0 0 0 1

Given n and p, does an algorithm exist to find an optimal encoding (shortest length) ? (and can it be proved that it computes an optimal solution?)

EDIT

Two approaches have been proposed so far to estimate a lower bound on the number of bits $m$. The goal of this section is to provide an analysis and a comparaison of the two answers, in order to explain the choice for the best answer.

Yuval's approach is based on entropy and provides a very nice lower bound: $\frac{logn}{h(p/n)}$ where $h(x) = xlogx + (1-x)log(x)$.

Alex's approach is based on combinatorics. If we develop his reasonning a bit more, it is also possible to compute a very good lower bound:

Given $m$ the number of bits $\geq\lceil log_2(n)\rceil$, there exists a unique $k$ such that
$$ 1+\binom{m}{1} + ... +\binom{m}{k} \lt n \leq 1+\binom{m}{1} + ... + \binom{m}{k}+\binom{m}{k+1}$$
One can convince himself that an optimal solution will use the codeword with all bits low, then the codewords with 1 bit high, 2 bits high, ..., k bits high. For the $n-1-\binom{m}{1}-...-\binom{m}{k}$ remaining symbols to encode, it is not clear at all which codewords it is optimal to use but, for sure the weights $w_i$ of each column will be bigger than they would be if we could use only codewords with $k+1$ bits high and have $|w_i - w_j| \leq 1$ for all $i, j$.
Therefore one can lower bound $p=max(w_i)$ with $$p_m = 0 + 1 + \binom{m-1}{2} +... + \binom{m-1}{k-1} + \lceil \frac{(n-1-\binom{m}{1}-...-\binom{m}{k}) (k+1)}{m} \rceil$$

Now, given

Solution

Here is a lower bound and an asymptotically matching construction, at least for some ranges of the parameters. Denote by $m$ the number of columns, and suppose for simplicity that $p \leq n/2$.

We start with a lower bound on $m$. Let $X$ be the encoding of symbol chosen uniformly at random. Let $X_1,\ldots,X_m$ be the individual coordinates, and let $w_i \leq p$ be the weight of the $i$th column. Then
$$
\log n = H(X) \leq \sum_{i=1}^m H(X_i) = \sum_{i=1}^m h(w_i/n) \leq m h(p/n).
$$
Therefore
$$
m \geq \frac{\log n}{h(p/n)}.
$$
Here $H$ is the entropy of a random variable $H(X) = -\sum_x \Pr[X=x] \log \Pr[X=x]$ and $h$ is the entropy function $h(x) = -x\log x-(1-x)\log(1-x)$. (You can use whatever base for the logarithm you want.)

The asymptotically matching construction, that should work for $p = \Omega(n)$, chooses $m$ a bit larger than this lower bound, and chooses a random encoding scheme, each bit being set to $1$ with some probability $q/n$ which is a bit smaller than $p/n$. Choosing the parameters correctly, we should get that this results in a legal encoding (all codewords are different and all column weights are at most $p$) with positive probability.

Context

StackExchange Computer Science Q#45783, answer score: 4

Revisions (0)

No revisions yet.