▶ Play
The causal mask ensures each token can only attend to itself and previous tokens, preventing the decoder from peeking at future positions.