How Each Method Weights Gradient Samples

Each dot is a sample with importance ratio ρ. Height = effective gradient weight f(ρ)·A. Adjust policy drift to see variance explode for f=ρ.

0.3
0.2

f(ρ) = 1  Policy Improvement

Mean gradient:
Std:
Max |weight|:

f(ρ) = clip  PPO

Mean gradient:
Std:
Max |weight|:

f(ρ) = ρ  Policy Gradient

Mean gradient:
Std:
Max |weight|: