How the velocity field is supervised
Mix initial noise $\mathbf{z}$ and target value $y(s_i, a_i) = r + \gamma Q_{\theta^{\text{old}}}(s_{i+1}, a_{i+1})$.
The velocity network learns to predict the direction $y - \mathbf{z}$ at each point along this path.
At $t=0$: pure noise. At $t=1$: pure target. In between: interpolation.
When the initial noise interval $[l, u]$ is narrow
All paths are nearly straight — the velocity field does the same computation at every step.
More integration steps don't help — no additional information gained.
When the initial noise interval $[l, u]$ is wide
Paths are curved — the velocity at each step depends on the current position $z_t$.
Each integration step does different computation, so more steps actually help!
Same velocity everywhere
Redundant compute
Different velocity at each step
Useful compute
Key idea: widen the noise distribution to force curved paths
This is what makes test-time compute scaling possible in floq.