patternMinor
Search problem when the effects of actions are not deterministic
Viewed 0 times
problemtheactionssearchareeffectsdeterministicwhennot
Problem
The standard search problem studied in AI contains:
I am interested in the following variant:
For example: Suppose we start at state $A$ and do the action "ABRA". This action can return state $D$ or state $E$. In state $D$ we can do the action "DAMA" which returns state $Z$; in state $E$ we can do the action "EAGL" which can return either state $Z$ or state $Y$. In state $Y$ we can do the action "YADA" which returns state $Z$.
The motivation here is automatic construction of algorithms. Every solution is an algorithm for reaching from $A$ to $Z$. For example, the above solution can be read as the following algorithm:
MY QUESTIONS:
- An initial state $A$;
- A final (goal) state $Z$;
- A set of actions that can change the state; each action takes as input a certain state and returns another state.
- A solution to such a problem is a sequence of actions starting at $A$ and ending at $Z$.
I am interested in the following variant:
- Each action can return one of several states; it is not known which state will be returned.
- A solution is a tree in which each path from the root ends at $Z$.
For example: Suppose we start at state $A$ and do the action "ABRA". This action can return state $D$ or state $E$. In state $D$ we can do the action "DAMA" which returns state $Z$; in state $E$ we can do the action "EAGL" which can return either state $Z$ or state $Y$. In state $Y$ we can do the action "YADA" which returns state $Z$.
The motivation here is automatic construction of algorithms. Every solution is an algorithm for reaching from $A$ to $Z$. For example, the above solution can be read as the following algorithm:
Begin (state=A)
ABRA
If D then
DAMA
End (state=Z)
Else If E then
EAGL
if Z then
End (state=Z)
Else (state=Y)
YADA
End (state=Z)
End If
End If
EndMY QUESTIONS:
- What are good references that deal with this kind of search problem?
- Are there any usable solvers for solving such problems (similar to e.g. planners or solvers for usual search problems)?
Solution
One helpful way to think of this is as a two-party game. When it is Alice's turn, she chooses an action; then, Bob chooses among the possible states that the action could return, and the system transitions to that state. We start at the start state $A$. Alice wins if she eventually reaches the goal state $Z$. Bob is trying to prevent Alice from ever reaching $Z$.
With this viewpoint in mind, there's a straightforward algorithm to determine whether Alice has a strategy to win this game. Let $S$ denote the set of all possible states. If $R \subseteq S$ is a set of states, define $f(R)$ to be the set of states $s$ such that there exists an action $a$ so that every state $t$ that could be returned by action $a$ (applied to $s$) satisfies $t \in R$. Also let $R_0 = \{Z\}$. Now compute the following sequence:
$$\begin{align*}
R_1 &= R_0 \cup f(R_0)\\
R_2 &= R_1 \cup f(R_1)\\
R_3 &= R_2 \cup f(R_2)\\
&\vdots
\end{align*}$$
Iterate until you reach a fixpoint, i.e., $R_{i+1}=R_i$. Then $R_i$ is the set of states that can reach $Z$: for every $s \in R_i$, there exists a strategy that Alice can use to ensure she will reach the goal state $Z$. So, we can simply check whether $A$ is in $R_i$ or not.
You can make the running time of this algorithm linear, by a simple trick. Store the set $R_i$ explicitly in memory. Also, you maintain a separate data structure for "the fringe": the set of states that can reach $R_i$ in a single step. For each state $s$ in the fringe and each action $a$ available on $s$, you maintain a counter that counts how many of the possible results of applying $a$ to $s$ will take you to the current set $R_i$. On each iteration, you can compute $R_{i+1}$ and update all of the counters. Each counter only decreases; it never increases. Also, a state $s$ is in $f(R_i)$ just when one of the actions available on $s$ has a counter that has decreased to zero, so computing $f(R_i)$ is just a matter of scanning the fringe and finding the counters that have hit zero; then you compute $R_{i+1}$, update the fringe, and update the affected counters. The running time of this algorithm is linear in the size of "the graph", i.e., in the number of states plus the number of transitions (triples of state $s$, action $a$ available on $s$, and state $t$ that could be returned by this action).
However, this straightforward algorithm is not very memory-efficient: it requires explicitly storing a large set of states. Also it requires enumerating the statespace, or a large part of the statespace. Therefore, it might be sufficient for small systems, but for large systems it might not be attractive.
For larger systems, you might want to consult the model checking literature to see if there are better solutions.
For example, I suspect you might want to look at Alternating-time Temporal Logic (ATL), in the model checking literature. It deals with search problems for two-party games. I think I recall that there are model-checking algorithms for solving problems associated with these kinds of two-party games, but I'm not certain about that.
(Incidentally: another way to think about your problem is that the possible results from each action correspond to non-deterministic choices, and there is an adversary (Bob) who is resolving all non-determinism in the worst possible way.)
With this viewpoint in mind, there's a straightforward algorithm to determine whether Alice has a strategy to win this game. Let $S$ denote the set of all possible states. If $R \subseteq S$ is a set of states, define $f(R)$ to be the set of states $s$ such that there exists an action $a$ so that every state $t$ that could be returned by action $a$ (applied to $s$) satisfies $t \in R$. Also let $R_0 = \{Z\}$. Now compute the following sequence:
$$\begin{align*}
R_1 &= R_0 \cup f(R_0)\\
R_2 &= R_1 \cup f(R_1)\\
R_3 &= R_2 \cup f(R_2)\\
&\vdots
\end{align*}$$
Iterate until you reach a fixpoint, i.e., $R_{i+1}=R_i$. Then $R_i$ is the set of states that can reach $Z$: for every $s \in R_i$, there exists a strategy that Alice can use to ensure she will reach the goal state $Z$. So, we can simply check whether $A$ is in $R_i$ or not.
You can make the running time of this algorithm linear, by a simple trick. Store the set $R_i$ explicitly in memory. Also, you maintain a separate data structure for "the fringe": the set of states that can reach $R_i$ in a single step. For each state $s$ in the fringe and each action $a$ available on $s$, you maintain a counter that counts how many of the possible results of applying $a$ to $s$ will take you to the current set $R_i$. On each iteration, you can compute $R_{i+1}$ and update all of the counters. Each counter only decreases; it never increases. Also, a state $s$ is in $f(R_i)$ just when one of the actions available on $s$ has a counter that has decreased to zero, so computing $f(R_i)$ is just a matter of scanning the fringe and finding the counters that have hit zero; then you compute $R_{i+1}$, update the fringe, and update the affected counters. The running time of this algorithm is linear in the size of "the graph", i.e., in the number of states plus the number of transitions (triples of state $s$, action $a$ available on $s$, and state $t$ that could be returned by this action).
However, this straightforward algorithm is not very memory-efficient: it requires explicitly storing a large set of states. Also it requires enumerating the statespace, or a large part of the statespace. Therefore, it might be sufficient for small systems, but for large systems it might not be attractive.
For larger systems, you might want to consult the model checking literature to see if there are better solutions.
For example, I suspect you might want to look at Alternating-time Temporal Logic (ATL), in the model checking literature. It deals with search problems for two-party games. I think I recall that there are model-checking algorithms for solving problems associated with these kinds of two-party games, but I'm not certain about that.
(Incidentally: another way to think about your problem is that the possible results from each action correspond to non-deterministic choices, and there is an adversary (Bob) who is resolving all non-determinism in the worst possible way.)
Context
StackExchange Computer Science Q#45739, answer score: 2
Revisions (0)
No revisions yet.