Interactive: Policy Improvement Theorem

Click directional triangles on each cell to change $\pi'$. The theorem guarantees: if $\mathbb{E}_{\pi'}[A^\pi] \geq 0$ everywhere, then $V^{\pi'} \geq V^\pi$ everywhere.

Goal (+1) Wall Penalty (−1) $V^\pi(s)$ $V^{\pi'}(s)$

Current policy $\pi$  (fixed)

New policy $\pi'$  (click to edit)

Click an arrow on the right grid to modify $\pi'$ and see the theorem in action.