patternMinor
The transition function in a Markov decision process
Viewed 0 times
theprocessfunctiondecisiontransitionmarkov
Problem
A Markov decision process is typically described as a tuple $\langle A,U,T,R \rangle $ where
What does this $A \times U \times A$ actually mean in terms of the MDP? It is written in all the papers, but never explained. Does it mean that all the states $a \in A$ are multiplied with all the action $u \in U$? Or something completely different?
- $A$ is the state space
- $U$ is the action space
- $T: A \times U \times A \mapsto [0,\infty) $ is the state transition probability function
- $R:A \times U \times A \mapsto \mathbb{R}$ is the reward function
What does this $A \times U \times A$ actually mean in terms of the MDP? It is written in all the papers, but never explained. Does it mean that all the states $a \in A$ are multiplied with all the action $u \in U$? Or something completely different?
Solution
The notation $T\colon A\times U\times A\to[0,\infty)$ means a function with three parameters, the first from $A$, the second from $U$, and the third from $A$, which outputs a non-negative real.
It is somewhat strange that the range is stated as $[0,\infty)$ rather than $[0,1]$. In fact, a perhaps better way of thinking of $T$ is as a function from $A \times U$ to the set of distributions over $A$. That is, $T$ gets a state and an action, and outputs a distribution over the set of states.
The semantics of $T$ are as follows: when at state $a$ and performing action $u$, the probability of moving to state $b$ is $T(a,u,b)$. Thus for all $a \in A$ and $u \in U$ we must have $\sum_{b \in A} T(a,u,b) = 1$.
It is somewhat strange that the range is stated as $[0,\infty)$ rather than $[0,1]$. In fact, a perhaps better way of thinking of $T$ is as a function from $A \times U$ to the set of distributions over $A$. That is, $T$ gets a state and an action, and outputs a distribution over the set of states.
The semantics of $T$ are as follows: when at state $a$ and performing action $u$, the probability of moving to state $b$ is $T(a,u,b)$. Thus for all $a \in A$ and $u \in U$ we must have $\sum_{b \in A} T(a,u,b) = 1$.
Context
StackExchange Computer Science Q#57716, answer score: 4
Revisions (0)
No revisions yet.