Standard view: reward comes from the environment

Reality: the agent rewards itself