Click a state-action pair to see which Q-table entries each update method modifies
Semi-gradient
only ∇θQ(st, at)
Full gradient
∇θQ(st, at) − γ∇θQ(st+1, a*)
Click a (s, a) pair above to compare the two update rules.
prediction: +αδ
target: −αγδ
unchanged