patternMinor
Why can we not find the generally superior machine learning algorithm in practice?
Viewed 0 times
whycanthepracticesuperiorlearningalgorithmfindmachinegenerally
Problem
TL;DR Our mind is a superior meta-machine learning algorithm, it can figure out tradeoffs in a much more general case than any machine learning algorithm we have. Furthermore, its Kolmogorov complexity is 4MB or less, and thus should be fairly easy to discover. However, despite our best efforts, no algorithm we've discovered has even approached this meta-capability. Is there a CS grounded explanation for this? The NFLT is not applicable because it does not apply in practice, and the human mind is an instance of said meta-algorithm.
There are many machine learning algorithms. The no free lunch theorem (NFLT) says, in theory, all these algorithms are exactly equal according to an error function over all possible problem domains. In practice, this doesn't matter because we are only dealing with a very small subset of these problem domains.
In practice, this often involves a tradeoff between multiple objectives. Making a problem multi-objective instead of single objective often causes it to become much more complex and difficult. There is no longer a single best solution, but there are still better and worse solutions. One way of describing the set of better solutions is with the notion of "Pareto optimality," which is the set of solutions that dominate all other solutions in all objectives, but do not dominate each other. These optimal solutions describe a surface in a graph of the solutions known as the "Pareto front." The following Figure is an example of a Pareto front where solutions A-H dominate all other solutions, but not each other.
That being the case, if we ask "what is the best ML algorithm?" we cannot say "None, b/c NFLT," since we are asking about the best ML algorithm in practice. If the NFLT does not matter in practice, then there must be a best ML algorithm, in practice. If there is a set of algorithms that describe the Pareto front, then the best algorithm is any member of this set.
What is this best practical algorithm? There does not s
There are many machine learning algorithms. The no free lunch theorem (NFLT) says, in theory, all these algorithms are exactly equal according to an error function over all possible problem domains. In practice, this doesn't matter because we are only dealing with a very small subset of these problem domains.
In practice, this often involves a tradeoff between multiple objectives. Making a problem multi-objective instead of single objective often causes it to become much more complex and difficult. There is no longer a single best solution, but there are still better and worse solutions. One way of describing the set of better solutions is with the notion of "Pareto optimality," which is the set of solutions that dominate all other solutions in all objectives, but do not dominate each other. These optimal solutions describe a surface in a graph of the solutions known as the "Pareto front." The following Figure is an example of a Pareto front where solutions A-H dominate all other solutions, but not each other.
That being the case, if we ask "what is the best ML algorithm?" we cannot say "None, b/c NFLT," since we are asking about the best ML algorithm in practice. If the NFLT does not matter in practice, then there must be a best ML algorithm, in practice. If there is a set of algorithms that describe the Pareto front, then the best algorithm is any member of this set.
What is this best practical algorithm? There does not s
Solution
There are many dimensions that "intelligence" could potentially be measured on. We already know of some artificial intelligence schemes that do better than humans on some dimensions.
It's true that we don't have artificial intelligence schemes that do better than humans, on the sorts of problems that tend to arise in practice as part of everyday human life and experience. On the other hand, humans have had hundreds of millions of years to evolve and adapt. Artificial intelligence has only been around for maybe half a decade. That might be one possible explanation.
Separately: The question seems to be based on several misconceptions. Let me point out a few things you've written, and try to clarify the misconceptions:
The point is that our mind is a superior algorithm, it can figure out tradeoffs in a much more general case than any machine learning algorithm we have.
Our mind is better for some tasks, but not all. There are other tasks where machines are better. For instance, machines are superior at chess, or at adding up small integers billions of times in a row without making a single error. Humans are superior at making other people laugh, cry, or feel better about themselves, or at folding laundry.
For instance, machines can solve Connect Four perfectly, but most people can't. Thus, if we allow "performance at Connect Four" as one of the dimensions that we're measuring Pareto-optimality at, machines are Pareto-optimal: they dominate humans in at least one dimension.
Now what you're probably really asking is, why do humans dominate machines in most dimensions? But to make that question well-defined, we have to define what is the set of all dimensions. What's probably implicit in that question is the assumption that you mean "the set of all dimensions we (humans) care about", or "the set of all dimensions that are meaningful to us (humans)". Now consider that the human brain has had hundreds of millions of years to evolve and get better at the dimensions that humans care about or that matter to us, whereas artificial intelligence hasn't.
Furthermore, its Kolmogorov complexity is 4MB or less, and thus should be fairly easy to discover.
Actually, no, that doesn't follow. Let's suppose for the sake of argument that an entire human organism can be described in 4MB of data (after compression; let's not quibble about whether that's actually biologically correct; and let's not worry about the fact that Kolmogorov complexity is primarily meaningful in an asymptotic sense, and ignore those issues), or equivalently, 32 megabits of data.
That means that a program that behaves like a human would require 32,000,000 bits of data to describe. So, you're asking: Why is it hard to find such a program? Perhaps implicit in your question is: How many programs can there be? How hard can it be to find the right one?
The answer is: it might be very very hard. There are $2^{32,000,000}$ possible programs (of that length). That's an enormously large space -- and it's a space where maybe all but one of them are useless, and only one is any good. Trying all of the possibilities is way beyond feasible: like, you couldn't do it within the lifetime of the universe, even if you imagine a universe full of supercomputers, where each atom in the known universe has the computational power of one of today's supercomputers. $2^{32,000,000}$ is an unimaginably large number.
So, it shouldn't be such a great surprise that it might be hard to find an example of such a program explicitly. Just because we know that it exists doesn't make it easy to find such a thing.
For instance, even if space aliens had landed 150 years ago and told us that there exists a proof of Fermat's last theorem, it still would have taken close to 150 years to find the proof. Just knowing that a proof exists doesn't really help you in finding it. I bet that the Latex source file for the papers that prove Fermat's last theorem correct could be compressed to less than 4MB. Even if I tell you that there exists a proof whose Kolmogorov complexity is at most 4MB, it's still not at all easy to find that proof.
It's true that we don't have artificial intelligence schemes that do better than humans, on the sorts of problems that tend to arise in practice as part of everyday human life and experience. On the other hand, humans have had hundreds of millions of years to evolve and adapt. Artificial intelligence has only been around for maybe half a decade. That might be one possible explanation.
Separately: The question seems to be based on several misconceptions. Let me point out a few things you've written, and try to clarify the misconceptions:
The point is that our mind is a superior algorithm, it can figure out tradeoffs in a much more general case than any machine learning algorithm we have.
Our mind is better for some tasks, but not all. There are other tasks where machines are better. For instance, machines are superior at chess, or at adding up small integers billions of times in a row without making a single error. Humans are superior at making other people laugh, cry, or feel better about themselves, or at folding laundry.
For instance, machines can solve Connect Four perfectly, but most people can't. Thus, if we allow "performance at Connect Four" as one of the dimensions that we're measuring Pareto-optimality at, machines are Pareto-optimal: they dominate humans in at least one dimension.
Now what you're probably really asking is, why do humans dominate machines in most dimensions? But to make that question well-defined, we have to define what is the set of all dimensions. What's probably implicit in that question is the assumption that you mean "the set of all dimensions we (humans) care about", or "the set of all dimensions that are meaningful to us (humans)". Now consider that the human brain has had hundreds of millions of years to evolve and get better at the dimensions that humans care about or that matter to us, whereas artificial intelligence hasn't.
Furthermore, its Kolmogorov complexity is 4MB or less, and thus should be fairly easy to discover.
Actually, no, that doesn't follow. Let's suppose for the sake of argument that an entire human organism can be described in 4MB of data (after compression; let's not quibble about whether that's actually biologically correct; and let's not worry about the fact that Kolmogorov complexity is primarily meaningful in an asymptotic sense, and ignore those issues), or equivalently, 32 megabits of data.
That means that a program that behaves like a human would require 32,000,000 bits of data to describe. So, you're asking: Why is it hard to find such a program? Perhaps implicit in your question is: How many programs can there be? How hard can it be to find the right one?
The answer is: it might be very very hard. There are $2^{32,000,000}$ possible programs (of that length). That's an enormously large space -- and it's a space where maybe all but one of them are useless, and only one is any good. Trying all of the possibilities is way beyond feasible: like, you couldn't do it within the lifetime of the universe, even if you imagine a universe full of supercomputers, where each atom in the known universe has the computational power of one of today's supercomputers. $2^{32,000,000}$ is an unimaginably large number.
So, it shouldn't be such a great surprise that it might be hard to find an example of such a program explicitly. Just because we know that it exists doesn't make it easy to find such a thing.
For instance, even if space aliens had landed 150 years ago and told us that there exists a proof of Fermat's last theorem, it still would have taken close to 150 years to find the proof. Just knowing that a proof exists doesn't really help you in finding it. I bet that the Latex source file for the papers that prove Fermat's last theorem correct could be compressed to less than 4MB. Even if I tell you that there exists a proof whose Kolmogorov complexity is at most 4MB, it's still not at all easy to find that proof.
Context
StackExchange Computer Science Q#75131, answer score: 5
Revisions (0)
No revisions yet.