▼

Decoder-Only LLM (Causal Attention)

▼

Embedding = h_EOS

Attention Mask

Only the EOS hidden state is used. Simple; works well with causal attention because EOS sees all tokens.
Used by Qwen3-Embedding, E5-Mistral.

Tokens: 5

▼

Decoder-Only LLM (Bidirectional Attention)

▼

▼ mean(h₁ ... h_n)

Embedding = mean pool

Attention Mask

Average all content token hidden states (instruction excluded). Requires bidirectional attention so every token is equally contextualized.
Used by LLM2Vec, KaLM, GritLM, DiffEmbed.

Tokens: 5

▼

Decoder-Only LLM (Bidirectional Attention)

▼ Q = [h₁, ..., h_n]

Cross-Attn:
softmax(QK^T)V

Latent Array
K=V (512 × d)
(learnable)

▼ MLP + mean pool

Embedding

Learnable cross-attention over a latent dictionary. More expressive than mean or EOS pooling — decoder tokens query a learned knowledge bank.
Used by NV-Embed (Lee et al., 2024).

▼ BERT encode + MLP project

Inst

C

t₁

t₂

t₃

t₄

EOS

▼

Decoder-Only LLM (Causal Attention)

▼

h_C

global ctx

+

h_EOS

sequence end

▼ concat

Embedding = [h_C ; h_EOS]

BERT encodes global context into a Contextual token (C), prepended before text. LLM processes with causal attention — all tokens can attend to C. Concat C + EOS hidden states.
Used by Causal2Vec (Lin et al., 2025).