KV Cache Flow: Cosmos Policy vs Fast-WAM

how clean conditioning's K/V is computed once and reused — same idea, two architectures

0 / 10

Cosmos Policy

cache inside one backbone: clean slots reused across 5 denoising steps

unified DiT

~2B

↑ reads K/V from

K/V cacheempty

Fast-WAM

cache across two backbones: video runs once, action iterates 10 steps

video branch

Video DiT

↑

action branch

Action DiT

↑

K/V cacheempty

clean conditioning

noised (initial)

recompute K/V (every step)

cached K/V (read, not recomputed)

The flow at a glance. Cosmos's cache lives inside one DiT: across the 5 denoising steps the 4 clean state slots are computed once and read every step, while the action / future / value slots are recomputed every step (because their noise level changes). Fast-WAM's cache lives across two DiTs: the 5B video DiT runs once on the clean first observation, its K/V is stored, and the 1B action expert reads that same K/V at every one of its 10 denoising steps — the video branch never runs again until the next action chunk.