Training: The Linear Interpolant

How the velocity field is supervised

$$\mathbf{z}(t) = (1 - t) \cdot \mathbf{z} + t \cdot y(\mathbf{s}_i, \mathbf{a}_i)$$

Mix initial noise $\mathbf{z}$ and target value $y(s_i, a_i) = r + \gamma Q_{\theta^{\text{old}}}(s_{i+1}, a_{i+1})$.

The velocity network learns to predict the direction $y - \mathbf{z}$ at each point along this path.

At $t=0$: pure noise. At $t=1$: pure target. In between: interpolation.

Narrow Noise: Straight Paths

When the initial noise interval $[l, u]$ is narrow

All paths are nearly straight — the velocity field does the same computation at every step.

More integration steps don't help — no additional information gained.

When the initial noise interval $[l, u]$ is wide

Paths are curved — the velocity at each step depends on the current position $z_t$.

Each integration step does different computation, so more steps actually help!

Same velocity everywhere

Redundant compute

Different velocity at each step

Useful compute

Key idea: widen the noise distribution to force curved paths

This is what makes test-time compute scaling possible in floq.