language-model | Jack (Hao) Bai

Feb 15, 2026	Generalizable Value Functions and Emotions (?)
Jan 09, 2026	How to Use Privileged Information in RL: On-policy Distillation
Dec 14, 2025	Autoregressive Embedding Models: Training, Attention, and Performance
Sep 01, 2025	Pretraining, Post-training, and Test-Time Reasoning
Jul 22, 2025	Are Multi-step Agents Overthinking?
May 27, 2025	Policy Optimization without a Critic: The GRPO Family
Mar 15, 2025	Can Language Models Be Critic Functions?
Dec 16, 2023	Mixture of Experts Explained
Apr 27, 2023	Backpropagation