patternMinor
Equivalence of Kolmogorov-Complexity definitions
Viewed 0 times
kolmogorovcomplexityequivalencedefinitions
Problem
There are many ways to define the Kolmogorov-Complexity, and usually, all these definitions they are equivalent up to an additive constant. That is if $K_1$ and $K_2$ are kolmogorov complexity functions (defined via different languages or models), then there exists a constant $c$ such that for every string $x$, $|K_1(x) - K_2(x)|
Are $K_1$ and $K_2$ equivalent? What is the relation between them, and which one grasps better the concept of Kolmogorov complexity, if they are not equivalent.
What especially bugs me is the rate $K_2$ increase with $x$, which seems not to be super-linear (or at least linear with constant $C>1$ such that $K_2 < C|x|$, rather than $|x|+c$).
Consider the most simple TM that outputs $x$ - the one that just encodes $x$ as part of its states and transitions function. it is immediate to see that
$K_1(x) \le |x|+1$. However the encoding of the same machine is much larger, and the trivial bound I get is $K_2(x) \le |x|\log |x|$.
- Length of Program: Define $K_2(x)$ to be the shortest "program" that outputs $x$. Namely, fix a way to encode TMs into binary strings; for a machine $M$ denote its (binary) encoding as $\langle M \rangle$. $K_2(x) = \min |\langle M \rangle|$ where the minimum is over all $M$'s that output $x$ on empty input.
Are $K_1$ and $K_2$ equivalent? What is the relation between them, and which one grasps better the concept of Kolmogorov complexity, if they are not equivalent.
What especially bugs me is the rate $K_2$ increase with $x$, which seems not to be super-linear (or at least linear with constant $C>1$ such that $K_2 < C|x|$, rather than $|x|+c$).
Consider the most simple TM that outputs $x$ - the one that just encodes $x$ as part of its states and transitions function. it is immediate to see that
$K_1(x) \le |x|+1$. However the encoding of the same machine is much larger, and the trivial bound I get is $K_2(x) \le |x|\log |x|$.
Solution
I apologize in advance for that I give way too many details, but I'm about to contradict people.
About $K(x)≤K'(x)+c$
The fact that $K_1(x)≤K_2(x)+c$ usually comes from an interpreter of the description language #2 into the description language #1 and not from a translation from programs of #2 into programs of #1.
For example $K_\mathtt{C}(x)≤K_\mathtt{Python}(x)+c_\mathtt{py2c}$ and you get this inequality as simply as this:
Then your constant $c_\mathtt{py2c}$ will be something like $528+490240688$ where $528$ is the number of bits for this code and $490240688$ bits is the size the official Python interpreter written in C. Of course you only need to interpret what is possible in your description language for Python so you can do better than 69 MB :-)
What is important is that you can write your Python program linearly in your C code. For example, a language where you need to put "BANANA" between every character is not a very good description program and the property is then false. (But if the description language authorizes you to write data in a separate file or in a block, this problem disappear)
Why your $K_1(x)=q$ is flawed
The problem with your definition of $K_1$ is that you may need more than $q$ bits to describe a Turing machine with $q$ states because you need to encode transitions.
So no $K_1$ and $K_2$ are probably not equivalent, but that's mainly $K_1$'s fault. I think that we can prove that for all $a>0$ there is a $c_a$ such that $K_1(x)≤a|x|+c_a$. Of course any $a0$) .. maybe less than $\log|x|$, even, but obtaining $O(|x|)$ seems hard. (And I expect it should be $|x|$, not even $O(|x|)$.)
About $K(x)≤K'(x)+c$
The fact that $K_1(x)≤K_2(x)+c$ usually comes from an interpreter of the description language #2 into the description language #1 and not from a translation from programs of #2 into programs of #1.
For example $K_\mathtt{C}(x)≤K_\mathtt{Python}(x)+c_\mathtt{py2c}$ and you get this inequality as simply as this:
void py_run(char * s) {
// code of your Python interpreter
}
int main(void) {
py_run("Put here your Python program of size Kpython(x)");
}Then your constant $c_\mathtt{py2c}$ will be something like $528+490240688$ where $528$ is the number of bits for this code and $490240688$ bits is the size the official Python interpreter written in C. Of course you only need to interpret what is possible in your description language for Python so you can do better than 69 MB :-)
What is important is that you can write your Python program linearly in your C code. For example, a language where you need to put "BANANA" between every character is not a very good description program and the property is then false. (But if the description language authorizes you to write data in a separate file or in a block, this problem disappear)
Why your $K_1(x)=q$ is flawed
The problem with your definition of $K_1$ is that you may need more than $q$ bits to describe a Turing machine with $q$ states because you need to encode transitions.
So no $K_1$ and $K_2$ are probably not equivalent, but that's mainly $K_1$'s fault. I think that we can prove that for all $a>0$ there is a $c_a$ such that $K_1(x)≤a|x|+c_a$. Of course any $a0$) .. maybe less than $\log|x|$, even, but obtaining $O(|x|)$ seems hard. (And I expect it should be $|x|$, not even $O(|x|)$.)
Code Snippets
void py_run(char * s) {
// code of your Python interpreter
}
int main(void) {
py_run("Put here your Python program of size Kpython(x)");
}Context
StackExchange Computer Science Q#146, answer score: 6
Revisions (0)
No revisions yet.