HiveBrain v1.2.0
Get Started
← Back to all entries
snippetMinor

How do I classify my emulator input optimization problem, and with which algorithm should I approach it?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
problemwithemulatorwhichinputoptimizationalgorithmhowandclassify

Problem

Due to the nature of the question, I have to include lots of background information (because my question is: how do I narrow this down?) That said, it can be summarized (to the best of my knowledge) as:

What methods exist to find local optimums on extremely large combinatorial search spaces?

Background

In the tool-assisted superplay community we look to provide specially-crafted (not generated in real-time) input to a video game console or emulator in order to minimize some cost (usually time-to-completion). The way this is currently done is by playing the game frame-by-frame and specifying the input for each frame, often redoing parts of the run many times (for example, the recently published run for The Legend of Zelda: Ocarina of Time has a total of 198,590 retries).

Making these runs obtain their goal usually comes down to two main factors: route-planning and traversal. The former is much more "creative" than the latter.

Route-planning is determining which way the player should navigate overall to complete the game, and is often the most important part of the run. This is analogous to choosing which sorting method to use, for example. The best bubble sort in the world simply isn't going to outperform a quick-sort on 1 million elements.

In the desire for perfection, however, traversal (how the route is carried out) is also a huge factor. Continuing the analogy, this is how the sorting algorithm is implemented. Some routes can't even be performed without very specific frames of input. This is the most tedious process of tool-assisting and is what makes the production of a completed run takes months or even years. It's not a difficult process (to a human) because it comes down to trying different variations of the same idea until one is deemed best, but humans can only try so many variations in their attention-span. The application of machines to this task seems proper here.

My goal now is to try to automate the traversal process in general for the Nintendo

Solution

From the information you give in your question, I can not see how to apply standard optimisation methods (that I know of). Your objects are not that complicated (more on that later) but your target function is a nasty one: its values are defined by an external system out of your control, it is unlikely to have any nice properties, and so on. Therefore, I think using genetic algorithms is not an unfeasible and maybe even a good approach here; they often work better than other methods if you have no clue about your problem's structure. There is much to consider about

  • object space,



  • target function and



  • parameters of your genetic algorithm,



so allow me elaborate.

What are your objects?

You have answered that already: your are looking at a sequence of actions, each of which takes up one frame. I think this may be too fine grained; maybe try a sequence of actions, each with a duration (in number of frames). This would allow to have mutations like "walk a bit longer" to have different probabilities than "insert a press of A" in a natural way. Try out what works best; you may have to revisit this item after thinking about the other ingredients.

What is your target function?

This one is really crucial. What to you want to optimize? Time to goal? Number of different actions? The number of collected stars? A combination of several factors? As soon as you get multiple targets, things get hairy -- there (usually) are no longer optima!

You mentioned time to goal. This is likely not a good target function at all. Why? Because most sequences will not even reach the goal so they will bottom-line to some constant, creating a fitness landscape like this (conceptual sketch in one dimension):

[source]

There are huge areas where the target funtion is $0$. Genetic algorithms are all about signals: small changes in the solution have to indicate an improvement (or decline) in quality if and only if the change is "directed" towards an optimal solution (ideally). If that is not the case (drastically), you have little more than a random search, hitting a good solution with probability near $0$. What does that mean for our target function? It has to be something that improves whenever a solution improves slightly, even if overall quality is still low. So what about

$\qquad \displaystyle \frac{1}{1 + \text{final distance to goal}} + \frac{1}{1 + \text{time to goal}}$

using "infinity" as time to goal if the goal is not reached, that is set the second summand to $0$. As long as the goal is not reached, getting nearer moves fitness up to $1$. All sequences that reach the goal have a baseline of $1$ and improve further the faster they are.

So how do you measure distance? Linear distance may look tempting but has its problems; again, wrong signals may be sent. Consider this simple scenario:

[source]

Every sequence that starts with a jump to the upper corridor improves until it reaches a spot just above the goal, but it can never actually get to the goal! Even worse, among all sequences that do not reach the goal, those that go up are as good as those that go down, so the GA can not reject sequences that are clearly doomed. In other words, linear distance creates particularly bad local optima which can trap the GA if there are dead ends in the level.

Therefore, I suggest you overlay a grid over your level and connect neighbour points if the game character can get from one to the other. Then you compute distance-from-goal by the length of the shortest path from the point nearest to where the sequence lands the character to the point nearest to the goal. This is easy to compute and walking into deadends (local optima) is immediately punished¹. Of course you need access to level data, but I assume you have those.

How does your GA work?

Now we can get to the actual genetic algorithm. The key considerations are population, selection, reproduction/mutation and stopping criterion.

Population

How large is your population going to be? If it is too small, it may not provide the diversity necessary to reach a good solution. If it is too large, you are more likely to carry around useless junk, slowing down the process.

How do you initialise your population? Do you pick random action sequences? If so, of which length? Do you have a (small) number of manually generated, reasonable solutions to seed with, maybe such that reach the goal?

Selection

Which individuals are selected for survival/reproduction? The $k$ best? Do you hold tournaments? Do you decide an individual's survival randomly with respect to its fitness? Do you want the best to survive in any case or can they die (may be useful to leave local optima)²?

The core concept here is selection pressure: how hard is it to survive? Make it too small and you don't weed out crap solutions. Make it too high and you make change (in particular moving between local optima) hard.

Reproduction and Mutation

Once you have selected your survivors of one round, you have to create th

Context

StackExchange Computer Science Q#1774, answer score: 6

Revisions (0)

No revisions yet.