patternMinor
Reinforcement Learning: An Introduction, A Gambler's Problem, Exercise 4.7 Solution
Viewed 0 times
problemgamblerexerciselearningreinforcementintroductionsolution
Problem
Currently reading a recent draft of Reinforcement Learning: An Introduction by Sutton and Barto. Really good book!
I was a bit confused by exercise 4.7 in chapter 4, section 4, page 93, (see attached photo) where it asks you to intuit about the form of the graph and the policy that converged. I do not understand why this policy is as it is, and was wondering if anyone has any insight into the problem or answers for this exercise in the book.
I have so far look extensively around Google for answer booklets or any work on the problem to no avail. I could not resolve why s=50 would be a bound with a high a=50 when surely any similar policy could be applied to s=51 with a=49? Any help is much appreciated, many thanks!
I was a bit confused by exercise 4.7 in chapter 4, section 4, page 93, (see attached photo) where it asks you to intuit about the form of the graph and the policy that converged. I do not understand why this policy is as it is, and was wondering if anyone has any insight into the problem or answers for this exercise in the book.
I have so far look extensively around Google for answer booklets or any work on the problem to no avail. I could not resolve why s=50 would be a bound with a high a=50 when surely any similar policy could be applied to s=51 with a=49? Any help is much appreciated, many thanks!
Solution
The question, as written, certainly suggests that it's a "good policy" to bet a=50 for s=50 , while not "betting it all" with a=49 for s=51. However, in an e-mail discussion surrounding this (here), Daniel Loeb asks a similar question and notes, "in particular, the simple policy of always making the largest allowed bet is optimal", which would suggest that betting 49 at s=51 would be fine. Andy Barto replies and seems to endorse this conclusion.
Perhaps, if we forget that by construction you can't bet a=51 when s=51, then you could interpret "betting it all" to be betting a=51 when s=51, which is clearly sub-optimal, because your goal is only to reach 100, not to exceed it, and there's no reason to risk complete ruin (loss of the entire stake). So in that sense, it makes sense to "bet it all" when s=50 but obviously not when s>50. But this is just speculation.
Perhaps, if we forget that by construction you can't bet a=51 when s=51, then you could interpret "betting it all" to be betting a=51 when s=51, which is clearly sub-optimal, because your goal is only to reach 100, not to exceed it, and there's no reason to risk complete ruin (loss of the entire stake). So in that sense, it makes sense to "bet it all" when s=50 but obviously not when s>50. But this is just speculation.
Context
StackExchange Computer Science Q#77145, answer score: 3
Revisions (0)
No revisions yet.