reinforcement-learning | Jack (Hao) Bai

Mar 11, 2026	rl What Does Flow-Matching Bring to Deep RL?
Feb 15, 2026	rl Generalizable Value Functions and Emotions (?)
Jan 09, 2026	rl How to Use Privileged Information in RL: On-policy Distillation
Nov 22, 2025	rl Adaptive Sampling and Curriculum Methods
Aug 07, 2025	rl Challenges in Scaling Q-Learning
May 27, 2025	rl Policy Optimization without a Critic: The GRPO Family
Mar 15, 2025	rl Can Language Models Be Critic Functions?
Oct 22, 2024	rl RL on Language under Single-step Settings
May 22, 2024	rl Importance Sampling: Why and How
Apr 07, 2024	rl Policy Improvement Theorem
Mar 13, 2024	rl The Policy Gradient Family: PG, PPO, and AC
Feb 18, 2024	rl Bellman Operator Identities