$J(\pi) \;=\;$
$\displaystyle\sum_{\tau}$
$R(\tau)$
$\;\cdot\;$
$P^{\pi}(\tau)$
▶
t = 3.0
Click a part of the formula to highlight what it represents in the figure.
Policy:
$\pi_\theta$ (prefers ↗)
$\pi_{\theta'}$ (prefers ↘)