gotchaMinor
Why does the experience propagation rule for checkers work in Tom Mitchell's book?
Viewed 0 times
whytherulebookexperiencetomfordoesworkmitchell
Problem
In Tom Mitchell's book "Machine Learning", Chap.1, a checkers game is used to illustrate how machine learning can be applied solve problems.
An experience propagation rule is described for iterative learning of a hypothesis. Suppose a game has been played and watched by the program, the state of the endgame is labeled 100 for winning and -100 for losing. For each of the states on the path toward the endgame, we label it as $\hat{V}(Successor)$, where $\hat{V}(state)$ is the current model output on some state. Then the model is trained by adding the new label, and iteratively, the model converges to a good checkers program.
Why does this experience propagation rule work? It is mentioned in the book that it works quite well for most chess games.
An experience propagation rule is described for iterative learning of a hypothesis. Suppose a game has been played and watched by the program, the state of the endgame is labeled 100 for winning and -100 for losing. For each of the states on the path toward the endgame, we label it as $\hat{V}(Successor)$, where $\hat{V}(state)$ is the current model output on some state. Then the model is trained by adding the new label, and iteratively, the model converges to a good checkers program.
Why does this experience propagation rule work? It is mentioned in the book that it works quite well for most chess games.
Solution
Sutton's book on reinforcement learning explains it but probably in more detail than what you're looking for, so I'll try to explain it more briefly and clearly.
Start by thinking about the last position before the end of the game. The winning player can win in one move from that position, so that position is almost certainly good for that player and bad for the other player. Now Think about the second to last position: that's only one move away from the last position which we've just established is probably a good position for the player that won in the end. The same logic applies to the third-last, fourth-last, etc.
See also: Wikipedia's page on mathematical induction, which works in a similar way.
Start by thinking about the last position before the end of the game. The winning player can win in one move from that position, so that position is almost certainly good for that player and bad for the other player. Now Think about the second to last position: that's only one move away from the last position which we've just established is probably a good position for the player that won in the end. The same logic applies to the third-last, fourth-last, etc.
See also: Wikipedia's page on mathematical induction, which works in a similar way.
Context
StackExchange Computer Science Q#3296, answer score: 5
Revisions (0)
No revisions yet.