snippetMinor
How did the authors of Spice apply weakest precondition rule on their example car problem?
Viewed 0 times
problemthecarweakestruledidapplyauthorsexamplehow
Problem
While reading the example given in [1]., I couldn't understand how the authors set up the logic to compute the weakest preconditions (wp) in their car example in section 4.2.
The dynamics of the problem are given by -
$x' = x + 0.1v$
$v' = v + 0.1a + \epsilon$
I don't see the authors working on the former equation, only the latter. In addition, while reading about wp from the book [2], I learned that one needs to come up with a suitable postcondition. I believe the postcondition here is $v <= 1$. Why did the authors start the wp computation using $v_1 <=1 \land v_2 <= 1$? Also, in the next step they do $v_1<=1 \land v_1 + 0.1a_1 + \epsilon_1 <= 1$. Why did they only expand the equation on the right of the conjunction?
Here is how I imagined it should have been done (based on my very limited knowledge of wp computations -
We have the equation $v' = v + 0.1a + \epsilon$ and we have a horizon of 2. Therefore, we write this equation 2 times (because that's how RL algorithms are unrolled) -
$v_1 = v_0 + 0.1a_0 + \epsilon_0$
$v_2 = v_1 + 0.1a_1 + \epsilon_1$
Now we set up a postcondition on $v_2$ such that $v_2<=1$ and then work backwards. My understanding is probably wrong, but I'd like the reader to see where I am getting confused.
Please let me know if any clarification from my side is needed.
Regarding the relation between $x'$ and $x$ and also $v_2$ and $v_1$: My understanding here is that $x'$ represents the subsequent state of the variable $x$. This is because of the following line in the paper - "A policy, in interaction with the environment, generates trajectories (or rollouts)
$x_0,u_0, x_1,u_1, \ldots, u_{n-1}, x_n$ where $x_0 \sim p_0$, each $u_i \sim \pi(x_i)$, and each $x_{i+1} \sim P(x_i, u_i)$." Here we can see that each state can be obtained from the subsequent state except the initial state.
[1]. Anderson, Greg, Swarat Chaudhuri, and Isil Dillig. "Guiding Safe Exploration with Weakest Preconditions." International Conference on Learning Repre
The dynamics of the problem are given by -
$x' = x + 0.1v$
$v' = v + 0.1a + \epsilon$
I don't see the authors working on the former equation, only the latter. In addition, while reading about wp from the book [2], I learned that one needs to come up with a suitable postcondition. I believe the postcondition here is $v <= 1$. Why did the authors start the wp computation using $v_1 <=1 \land v_2 <= 1$? Also, in the next step they do $v_1<=1 \land v_1 + 0.1a_1 + \epsilon_1 <= 1$. Why did they only expand the equation on the right of the conjunction?
Here is how I imagined it should have been done (based on my very limited knowledge of wp computations -
We have the equation $v' = v + 0.1a + \epsilon$ and we have a horizon of 2. Therefore, we write this equation 2 times (because that's how RL algorithms are unrolled) -
$v_1 = v_0 + 0.1a_0 + \epsilon_0$
$v_2 = v_1 + 0.1a_1 + \epsilon_1$
Now we set up a postcondition on $v_2$ such that $v_2<=1$ and then work backwards. My understanding is probably wrong, but I'd like the reader to see where I am getting confused.
Please let me know if any clarification from my side is needed.
Regarding the relation between $x'$ and $x$ and also $v_2$ and $v_1$: My understanding here is that $x'$ represents the subsequent state of the variable $x$. This is because of the following line in the paper - "A policy, in interaction with the environment, generates trajectories (or rollouts)
$x_0,u_0, x_1,u_1, \ldots, u_{n-1}, x_n$ where $x_0 \sim p_0$, each $u_i \sim \pi(x_i)$, and each $x_{i+1} \sim P(x_i, u_i)$." Here we can see that each state can be obtained from the subsequent state except the initial state.
[1]. Anderson, Greg, Swarat Chaudhuri, and Isil Dillig. "Guiding Safe Exploration with Weakest Preconditions." International Conference on Learning Repre
Solution
Let's address your questions/confusions in order:
1.
I don't see the authors working on the former equation, only the latter.
That's because $x$ doesn't matter:
-
Why did the authors start the wp computation using $v_1 \leq 1 \wedge v_2 \leq 1$?
They state that their safety horizon is $2$, that is, they want to express that the safety condition holds both after $1$ step and after $2$ steps. In their notation, where $v_i$ refers to the value of $v$ after $i$ steps, that's exactly expressed by the condition $v_1 \leq 1 \wedge v_2 \leq 1$.
3.
Also, in the next step they do $v_1 \leq 1 \wedge v_1+0.1 a_1 + \epsilon_1 \leq 1$. Why did they only expand the equation on the right of the conjunction?
Because the 2nd iteration of the step function affects the variables subscripted with $2$. The LHS of the conjunction has only a $1$-subscripted variable, $v_1$. It is thus invariant under the substitution $[v_2 \mapsto _1+0.1 a_1 + \epsilon_1]$ performed to compute the weakest precondition.
Why didn't the authors include $v_0$?
(a) They state in the example that $v_0 = 0.9$, trivially satisfying $v_0 \leq 1$ and (b) $v_0$ is not affected by the dynamics.
Why did the author then work on both sides in the next step.
Because both sides refer to $v_1$, which is assigned to in that step. Whence the substitution applies meaningfully to both.
1.
I don't see the authors working on the former equation, only the latter.
That's because $x$ doesn't matter:
- it does not occur in the postcondition ($v \leq 1$) investigated here
- it does not occur in the RHS of the equation for $v$
-
Why did the authors start the wp computation using $v_1 \leq 1 \wedge v_2 \leq 1$?
They state that their safety horizon is $2$, that is, they want to express that the safety condition holds both after $1$ step and after $2$ steps. In their notation, where $v_i$ refers to the value of $v$ after $i$ steps, that's exactly expressed by the condition $v_1 \leq 1 \wedge v_2 \leq 1$.
3.
Also, in the next step they do $v_1 \leq 1 \wedge v_1+0.1 a_1 + \epsilon_1 \leq 1$. Why did they only expand the equation on the right of the conjunction?
Because the 2nd iteration of the step function affects the variables subscripted with $2$. The LHS of the conjunction has only a $1$-subscripted variable, $v_1$. It is thus invariant under the substitution $[v_2 \mapsto _1+0.1 a_1 + \epsilon_1]$ performed to compute the weakest precondition.
- (from your 1st reply comment)
Why didn't the authors include $v_0$?
(a) They state in the example that $v_0 = 0.9$, trivially satisfying $v_0 \leq 1$ and (b) $v_0$ is not affected by the dynamics.
- (from your 2nd reply comment)
Why did the author then work on both sides in the next step.
Because both sides refer to $v_1$, which is assigned to in that step. Whence the substitution applies meaningfully to both.
Context
StackExchange Computer Science Q#165368, answer score: 2
Revisions (0)
No revisions yet.