The Method Matrix: Privileged Info $\times$ Optimization

Click any colored method name to see its details below.

Priv. Info / Optim.PG (2025–26)OPD (2026)ICL (2024–25)
Optimal Trajectory POPE, InT OPSD, SDFT Not novel
Optimal Policy (not interesting) Vanilla OPD Not novel
Unstructured Reward Guiding PRM SDPO RLEF
Structured Reward Always used; not standalone Not fine-grained Not fine-grained

Click a method name above to see details.