Click a state-action pair to see which Q-table entries each update method modifies

Select (s_t, a_t):

Semi-gradient

only ∇_θQ(s_t, a_t)

Full gradient

∇_θQ(s_t, a_t) − γ∇_θQ(s_t+1, a*)

Click a (s, a) pair above to compare the two update rules.

prediction: +αδ target: −αγδ unchanged