HiveBrain v1.2.0
Get Started
← Back to all entries
gotchaMinor

Why does Min-Max algorithm delays a good move indefinitely?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
whyindefinitelyminalgorithmdelaysmaxmovedoesgood

Problem

Consider a simple chess example:

Q is white Queen.

K, R is black King and black Rook respectively.

A B
1 . Q
2 . .
3 K .
4 . .
5 . .
6 R .
7 . .
8 . R


1,2...8 is the rank and A and B are the files of the chessboard.

Assume the king can only move horizontally for this example.

Here, if we search to depth 0, the best move should be Qxb8 (queen takes rook on B8)

But if we search for depth 2, then one line goes:

1. Qa1+ Kb3 2. Qxa6 (evaluation of the position +100 as rook is taken)


Second line:

1. Qxb8 Ra4 2. Qb1 (evaluation of the position +100 as rook is taken)


And since both have the same evaluation we select the first variation and play Qa1+ and let's say opposing player plays Kb3.

And now, if we again search to depth 2, same situation arise:

One line:

1. Qb1+ Ka3 2. Qxb8 (evaluation of the position +100 as rook is taken)


Second line:

1. Qxa6 Rb4 2. Qa1 (evaluation of the position +100 as rook is taken)


And again since both lines have the same evaluation we select the first variation and play Qb1+ and let's say opposing player plays Ka3.

And this will keep repeating and the queen will never capture the rook and just keep checking the king in the hope of capturing the rook in the next move but again repeats the same idea.

I hope this example makes it much clear what I am trying to say.

Solution

In a game-theoretical sense, both of the moves you describe are equally good in the scenarios you describe, so the algorithm is "correct" in that it doesn't matter which move it picks -- you do not explicitly encode a preference for faster wins (or faster gains of $+100$) over slower scores.

Introducing a discount factor to explicitly encode such a preference, as described in ryan's answer, can be a solution... but it's not very common in game AI / minimax / alphabeta search literature, and it can also be detrimental in slightly different scenarios. For example, if we can get a score of $+100$ in one ply, or a score of $+101$ in the second ply, we would generally prefer the path leading to that delayed $+101$... but a common choice of a discount factor $\gamma = 0.9$ would lead us to prefer the suboptimal path. Note that this does not just hold for the example of $\gamma = 0.9$; we can come up with such a counterexample for any value $0 \leq \gamma

  • Proceed to run a $2$-ply search afterwards if we still have time



  • Proceed to run a $3$-ply search if we still have time



  • etc.



This is referred to as iterative deepening. Now, when we start a new search (with an increased depth limit), a common trick is to first re-order the moves at the root node based on the scores found in the previous search (with the lower depth limit).

Now, imagine that we apply this process to the scenario described in your question. The Qxb8 line of play will first be discovered to give a $+100$ score in one of the first iterations (shallow searches). Based on this, we will sort that Qxb8 line to be searched before the Qa1+ line when moving on to the next iteration with a deeper search. Both lines will then be discovered to have the same $+100$ score, but now, due to clever move ordering (not specific to this case, but clever move ordering in general thanks to iterative deepening!), the shorter line will have been searched first and be the preferred line of play.

Note that the "trick" with iterative deepening + move ordering is not just useful to solve the issue highlighted in this question, but in general can also significantly increase the number of prunings that can be achieved when we extend the basic minimax algorithms with techniques like alpha-beta pruning.

Context

StackExchange Computer Science Q#115877, answer score: 5

Revisions (0)

No revisions yet.