reinforcement-learning | Jack (Hao) Bai

Mar 11, 2026	What Does Flow-Matching Bring to Deep RL?
Feb 15, 2026	Generalizable Value Functions and Emotions (?)
Jan 09, 2026	How to Use Privileged Information in RL: On-policy Distillation
Nov 22, 2025	Adaptive Sampling and Curriculum Methods
Oct 01, 2025	Position: Why Web is a Good Environment to Study RL?
Aug 07, 2025	Challenges in Scaling Q-Learning
Jul 22, 2025	Are Multi-step Agents Overthinking?
May 27, 2025	Policy Optimization without a Critic: The GRPO Family
Mar 15, 2025	Can Language Models Be Critic Functions?
Oct 22, 2024	RL on Language under Single-step Settings
May 22, 2024	Importance Sampling: Why and How
Apr 07, 2024	Policy Improvement Theorem
Mar 13, 2024	The Policy Gradient Family: PG, PPO, and AC
Feb 18, 2024	Bellman Operator Identities
Feb 01, 2018	Ilya Sutskever: Meta Learning and Self Play