-
Notes on Q-Learning Scalability
Some notes after revisiting Seohong Park's blog "Q-learning is not yet scalable", discussing the depth and width dimensions of task difficulty, TD error accumulation, and connections to WebGym.
-
Zero Intervention, Short Thinking, and More Actions - A New Paradigm for Multi-step RL for Language Models
This article is a brief discussion of whether and why auto-regressive language models can perform well on simple reasoning tasks.
-
Is Auto-Regressive Language Model Simply Memorizing Answers or Learning to Reason?
This article is a brief discussion of whether and why auto-regressive language models can perform well on simple reasoning tasks.
-
A Complete Tutorial on Self-Attention & Transformer
This article explains the Transformer architecture thoroughly, from RNN to self-attention, and then to Transformer.