Each row of is a weighted combination of all token embeddings, where the weights come from that token's attention over the sequence.