Self-attention as database retrieval: each query looks up all keys to get coefficients, then takes a weighted sum of the corresponding values.