How noisy top-k gating routes a token to experts. Grayed experts receive zero weight.
Left: raw logits $x \cdot W_g$ + noise. Right: gating weights after top-k + softmax. Only top-k experts are active (colored).