How Each Method Weights Gradient Samples

Each dot is a sample with importance ratio ρ. Height = effective gradient weight f(ρ)·A. Adjust policy drift to see variance explode for f=ρ.

Policy drift σ: 0.3

ε (clip): 0.2

Mean gradient:—

Std:—

Max |weight|:—

Mean gradient:—

Std:—

Max |weight|:—

Mean gradient:—

Std:—

Max |weight|:—