patternMinor
Are nearly all natural numbers compressible?
Viewed 0 times
nearlyallarenumberscompressiblenatural
Problem
A simple counting argument shows most strings can't be compressed to shorter strings. But, compression is usually defined using Kolmogorov complexity. A string is compressible if its Kolmogorov complexity is less than its length, $K(s) 3 \rightarrow K(n) \leq n-2)$
We know natural number $n$ can be represented as a binary number with $log_2(n)+1$ bits. Let a number be super compressible if $K(n) < log_2(n)+1$. We see 4 is super compressible because $K(4)=2 < log_2(4)+1 = 3$. It is known $\sum(5) \ge 4098$. This means $K(4098) = 5$ and 4098 is super compressible. By extending this 5-state machine we can show 4099 through 4107 are also super compressible. In general, the range $\sum(n)$ through $\sum(n)+log_2(\sum(n))-n$ will be super compressible. For example, it is known $\sum(6) \ge 3.515 10^{18267}$. This proves the range $3.515 10^{18267}$ through $3.515 * 10^{18267} + log_2(10^{18267})$ is super compressible.
Are all large enough natural numbers super compressible? If not, why not?
Note there are lots of machines besides the ones defined by $\sum(n)$. There are probably lots of halting 5-state machines that write more that $2^5$ 1's and less than 4098 1's. All such machines define ranges of super compressible numbers. I previously asked a similar question without getting an adequate answer.
We know natural number $n$ can be represented as a binary number with $log_2(n)+1$ bits. Let a number be super compressible if $K(n) < log_2(n)+1$. We see 4 is super compressible because $K(4)=2 < log_2(4)+1 = 3$. It is known $\sum(5) \ge 4098$. This means $K(4098) = 5$ and 4098 is super compressible. By extending this 5-state machine we can show 4099 through 4107 are also super compressible. In general, the range $\sum(n)$ through $\sum(n)+log_2(\sum(n))-n$ will be super compressible. For example, it is known $\sum(6) \ge 3.515 10^{18267}$. This proves the range $3.515 10^{18267}$ through $3.515 * 10^{18267} + log_2(10^{18267})$ is super compressible.
Are all large enough natural numbers super compressible? If not, why not?
Note there are lots of machines besides the ones defined by $\sum(n)$. There are probably lots of halting 5-state machines that write more that $2^5$ 1's and less than 4098 1's. All such machines define ranges of super compressible numbers. I previously asked a similar question without getting an adequate answer.
Solution
For every $n$ there is a "busy beaver" machine (i.e., a Turing machine on the tape alphabet $\{0,1\}$ run on the empty tape) which outputs $1^n$ using $O(\log n/\log\log n)$ states, which is asymptotically optimal (up to constants).
Here is how this machine works. For every $C$, we will construct such a machine having $O(\log n/C) + O(C2^C)$ states. The machine will first write the binary representation of $n$, and then use $O(1)$ states to convert it to its unary representation (that is, $1^n$); we will only describe the first phase.
The machine runs in $\log n/C$ rounds, maintaining a unary counter whose value at round $r$ is $1^r$ (after all the rounds finish, we erase the counter). At each round, we write $C$ bits of the binary representation of $n$. This is accomplished using $O(C2^C)$ states which implement adding $C$ bits (for which choice of $C$ bits), and $O(\log n/C)$ states for control (we use the unary counter to switch control to the correct state). This completes the description of the machine.
A machine with $S$ states on the binary tape alphabet has description size $O(S\log S)$. Hence we expect that the optimal answer be $O(\log n/\log\log n)$. Indeed, if we choose $C$ very carefully to be $\log (\log n/(\log\log n)^2)$, then the construction above gives a "busy beaver" machine with $O(\log n/\log\log n)$ states, which is optimal up to constants.
If the tape alphabet is allowed to grow with $n$, then $O(1)$ states suffice to output $1^n$: the alphabet will be of size $n + O(1)$, and we will use a certain cell as a counter. This construction shows that it is important to limit the tape alphabet.
Here is how this machine works. For every $C$, we will construct such a machine having $O(\log n/C) + O(C2^C)$ states. The machine will first write the binary representation of $n$, and then use $O(1)$ states to convert it to its unary representation (that is, $1^n$); we will only describe the first phase.
The machine runs in $\log n/C$ rounds, maintaining a unary counter whose value at round $r$ is $1^r$ (after all the rounds finish, we erase the counter). At each round, we write $C$ bits of the binary representation of $n$. This is accomplished using $O(C2^C)$ states which implement adding $C$ bits (for which choice of $C$ bits), and $O(\log n/C)$ states for control (we use the unary counter to switch control to the correct state). This completes the description of the machine.
A machine with $S$ states on the binary tape alphabet has description size $O(S\log S)$. Hence we expect that the optimal answer be $O(\log n/\log\log n)$. Indeed, if we choose $C$ very carefully to be $\log (\log n/(\log\log n)^2)$, then the construction above gives a "busy beaver" machine with $O(\log n/\log\log n)$ states, which is optimal up to constants.
If the tape alphabet is allowed to grow with $n$, then $O(1)$ states suffice to output $1^n$: the alphabet will be of size $n + O(1)$, and we will use a certain cell as a counter. This construction shows that it is important to limit the tape alphabet.
Context
StackExchange Computer Science Q#47367, answer score: 6
Revisions (0)
No revisions yet.