EOS Pooling
Mean Pooling
Latent Attention
Causal2Vec
Decoder-Only LLM  (Causal Attention)
Embedding = hEOS
Attention Mask
Only the EOS hidden state is used. Simple; works well with causal attention because EOS sees all tokens.
Used by Qwen3-Embedding, E5-Mistral.
5
Decoder-Only LLM  (Bidirectional Attention)
▼ mean(h1 ... hn)
Embedding = mean pool
Attention Mask
Average all content token hidden states (instruction excluded). Requires bidirectional attention so every token is equally contextualized.
Used by LLM2Vec, KaLM, GritLM, DiffEmbed.
5
Decoder-Only LLM  (Bidirectional Attention)
▼ Q = [h1, ..., hn]
Cross-Attn:
softmax(QKT)V
Latent Array
K=V (512 × d)
(learnable)
▼ MLP + mean pool
Embedding
Learnable cross-attention over a latent dictionary. More expressive than mean or EOS pooling — decoder tokens query a learned knowledge bank.
Used by NV-Embed (Lee et al., 2024).
▼ BERT encode + MLP project
Inst
C
t1
t2
t3
t4
EOS
Decoder-Only LLM  (Causal Attention)
hC
global ctx
+
hEOS
sequence end
▼ concat
Embedding = [hC ; hEOS]
BERT encodes global context into a Contextual token (C), prepended before text. LLM processes with causal attention — all tokens can attend to C. Concat C + EOS hidden states.
Used by Causal2Vec (Lin et al., 2025).