patternMinor

Are coevolutionary "Free Lunches" really free lunches?

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

machine-learning optimization cs stackoverflow

reallylunchesfreearecoevolutionary

Problem

In their paper "Coevolutionary Free Lunches" David Wolpert and William Macready describe a set of exceptions to the No Free Lunch theorems they proved in an earlier paper. The exceptions involve two-player games in which a player tries to minimize expected search cost given optimal play, or at least good play, by an opponent.

Free lunches are "allowed" in this case because the exact choice of cost function to minimize changes depending on which fitness (i.e. objective) functions previous rounds of play have "ruled out." In other words, given that an opponent already knows something about the game, and chooses responses that minimize the player's expected return, the player can eliminate certain fitness functions without having to evaluate them. To illustrate how this works, W&M provide this chart:

Here, $\langle g \rangle$ represents the search algorithm that pays no attention to the opponent's moves; $\langle g \rangle_{1}$ represents the search algorithm that considers all possible opponent replies to each move; and $\langle g \rangle_{2}$ represents the search algorithm that samples just one possible opponent reply to each move.

This illustrates the character of the free lunch provided: algorithms that take into account information provided by the opponent do better than algorithms that don't, and those that collect as much information as possible from the opponent do better than those that only collect some. W&M amplify this point in their later discussion of opponent intelligence. They show that even when an opponent is not omniscient, but has partial knowledge, the player can exploit its partial knowledge. In the case of total ignorance this won't work because the opponent always replies with a random move. In that case, there appears to be no gain:

the expected performance of the agent will be the average over the antagonist’s possible responses.

I guess this looks like $\langle g \rangle$ above. But in cases where the opponent has some knowledge

Solution

Coevolutionary algorithms can't magically accelerate progress on any arbitrary problem class. So in that sense, the conclusion at the end of the question is correct. However, it doesn't follow that all coevolutionary free lunches are trivial, as the conclusion of the question suggests.

I can't offer an exhaustive account of all the kinds of coevolutionary free lunches, but I can offer two examples, the first of which is trivial, and the second of which I would argue is nontrivial. The second is nontrivial because it helps explain why the no free lunch theorem really must hold.

The important difference between the two examples is this: in the first, the competing algorithms are competing to achieve the same overarching goal, while in the second, the competing algorithms are trying to achieve different goals. In the second case, the mismatch between the two algorithms' goals allows interesting things to happen. I'll begin with the trivial example.

Opponents seeking the same goal

Imagine a very simple optimization problem in which the search landscape is a 7x7 grid of cells. The primary goal is to find the cell with the maximum value. 48 of the cells have a value of 0, and one randomly chosen cell on the grid has a value of 1.

Our secondary goal is to discover a search strategy that finds the maximum value more quickly. But it follows from the initial problem that no strategy could possibly beat random search here, because nothing can be learned from one cell about another. Nonetheless, the coevolutionary free lunch theorem holds! Here's why:

Suppose you have two optimization algorithms, A and B, both searching the grid. It doesn't really matter what strategy they use, but for concreteness, we'll stipulate that they both use a random search strategy. The only difference between them is that B pays attention to A's moves, and when it sees that A has found the maximum value, it jumps to that cell too. In some sense, when that happens, B has still "lost" the contest. But if you run a lot of competitions, and then compare the average performance of A to the average performance of B, you'll see that B finds the maximum value faster on average.

The explanation is simple. The average time to first discovery -- whether by A or B -- stays the same. But whenever A beats B to the win, B doesn't bother looking anywhere else. It skips ahead to the best cell. When B beats A to the win, on the other hand, A just continues searching, plodding along until it finds the maximum value on its own.

This looks like a win if you're only counting the number of moves B makes. If you look at the total number of moves that A and B make together, they actually do worse together than either would on its own, on average. That's because of A's obliviousness. If we change A to behave the same way as B, then they do about as well together as either would on its own -- but no better.

So here, modeling both algorithms together returns us directly to the no free lunch zone, just as the question argues. In effect, A and B are just performing random search algorithms in parallel. The final number of search operations remains the same.

Opponents seeking different goals

Now imagine a very different scenario. Suppose we have a classification problem: recognizing sheep. Here, A's job is to look at a stream of pictures and say whether there are sheep in them. Simple enough.

But B's job is very different. B has the power to inject pictures of its own into the stream! Its goal is not to identify sheep; it just wants to slow A down.

How can B do this? There's a great blog post and associated Twitter thread about a closely related question by Janelle Shane. The Twitter thread begins:

Does anyone have a picture of sheep in a really unusual place? It's for pranking a neural net.

And here's one of the first replies:

How about an orange sheep?

Here's the orange sheep:

Turns out, this was perfect for pranking:

You totally got it. Orange sheep are not a thing it was expecting.
"a brown cow laying on top of a lush green field"

Here are a couple of other examples from Shane's blog post:

So what does this have to do with our problem? We can connect them by being more precise. Suppose A's goal is to reach greater than 99% accuracy, and B has the power to inject one picture into A's stream for every nine "natural" pictures. B looks for patterns in A's behavior and uses them to find pictures that mess with its model. This will keep A's accuracy below 99% for much longer than if A saw only "natural" pictures.

Two things follow from this. First, B will do much better if it pays attention to what A does. If B just picks images on the basis of some randomly chosen general principle like "sheep in odd places," then there's a good chance that A will be prepared for them already. If not, it will quickly learn to handle them correctly, and B will then have to adopt a new strategy. On the other hand, if B watches A's behavior, it can pick out

Context

StackExchange Computer Science Q#57546, answer score: 2

Revisions (0)

No revisions yet.