How Each Method Weights Gradient Samples
Each dot is a sample with importance ratio ρ. Height = effective gradient weight f(ρ)·A. Adjust policy drift to see variance explode for f=ρ.
f(ρ) = 1 Policy Improvement
Mean gradient:—
Std:—
Max |weight|:—
f(ρ) = clip PPO
Mean gradient:—
Std:—
Max |weight|:—
f(ρ) = ρ Policy Gradient
Mean gradient:—
Std:—
Max |weight|:—