Blogs

Research notes on reinforcement learning, language models, and agentic reasoning by Jack (Hao) Bai.

May 28, 2026 horo Mechanical Watchmaking
May 12, 2026 rl AI for Scientific Discovery: Three Milestones and a Benchmark Map
Apr 07, 2026 rl Video Models and World Action Modeling
Apr 01, 2026 psy Analytical Psychology
Mar 11, 2026 rl What Does Flow-Matching Bring to Deep RL?
Feb 15, 2026 rl Generalizable Value Functions and Introverted Intuition (Ni)
Feb 12, 2026 music The Pentatonic Scale
Feb 02, 2026 Vincent Sitzmann: The Bitter Lesson of Computer Vision
Jan 09, 2026 rl How to Use Privileged Information in RL: On-policy Distillation

Dec 14, 2025 llm Autoregressive Embedding Models: Training, Attention, and Performance
Dec 13, 2025 music Non-Diatonic Notes
Nov 25, 2025 Ilya Sutskever: From the Age of Scaling to the Age of Research
Nov 22, 2025 rl Adaptive Sampling and Curriculum Methods
Oct 01, 2025 agent Position: Why Web is a Good Environment to Study RL?
Sep 18, 2025 phil Foundations of Reductionism
Sep 01, 2025 llm Pretraining, Post-training, and Test-Time Reasoning
Aug 24, 2025 music Jazz Chords and Their Variants
Aug 07, 2025 rl Challenges in Scaling Q-Learning
Jul 22, 2025 agent Are Multi-step Agents Overthinking?
Jul 04, 2025 info Kolmogorov Complexity
Jun 13, 2025 music The Komuro Progression
May 27, 2025 rl Policy Optimization without a Critic: The GRPO Family
Mar 15, 2025 rl Can Language Models Be Critic Functions?

Oct 22, 2024 rl RL on Language under Single-step Settings
Aug 01, 2024 llm LLM Optimization Basics: Memory
Jun 15, 2024 llm LLM Optimization Basics: Time
May 22, 2024 rl Importance Sampling: Why and How
Apr 07, 2024 rl Policy Improvement Theorem
Mar 13, 2024 rl The Policy Gradient Family: PG, PPO, and AC
Feb 18, 2024 rl Bellman Operator Identities

Dec 16, 2023 llm Mixture of Experts Explained
Sep 09, 2023 llm RoPE and M-RoPE: Rotation, Decay, and Multimodal Axes
Aug 15, 2023 Ilya Sutskever: An Observation on Generalization
Jun 07, 2023 llm Self-Attention Layer and The Transformers Architecture
May 20, 2023 math Dynamic Programming: Foundations
Apr 27, 2023 llm Backpropagation

Feb 01, 2018 Ilya Sutskever: Meta Learning and Self Play