The Core Idea: Transport Noise into Data

Flow matching learns a velocity field that moves particles from a simple distribution to a target distribution

Each particle follows a curved path from its noise position (top) to its data position (bottom).

Paths curve because the learned velocity at each point averages over all conditional paths passing through that region.

A velocity field $v_\theta(t, z)$ tells each particle which direction to move at every moment.

The Flow ODE

A flow is defined by an ordinary differential equation

$$\frac{dz}{dt} = v_\theta(t, z)$$

Starting from $z_0 \sim p_{\text{noise}}$, we integrate this ODE from $t=0$ to $t=1$.

In practice, we use Euler discretization with $K$ steps:

$$z_{k+1} = z_k + \frac{1}{K}\, v_\theta\!\left(\frac{k}{K},\, z_k\right)$$

Each step is just a forward pass through the network $v_\theta$, applied to the current position.

More steps $K$ = finer approximation of the continuous flow = more compute.

Training: The Conditional Flow-Matching Trick

We don't need to solve the ODE during training!

The key insight: instead of learning the marginal velocity field, we learn conditional velocities.

Given a noise sample $z_0$ and a target data point $x_1$, define a straight-line path:

$$z(t) = (1-t)\, z_0 + t\, x_1 \qquad \text{(interpolant)}$$

The velocity along this path is simply:

$$u(t \mid z_0, x_1) = x_1 - z_0 \qquad \text{(constant direction)}$$

Train the network to predict this velocity at the interpolated point:

$$\mathcal{L}(\theta) = \mathbb{E}_{t,\, z_0,\, x_1}\!\left[\left\lVert v_\theta(t, z(t)) - (x_1 - z_0)\right\rVert^2\right]$$

Note: the straight-line interpolants are the training targets. The actual learned flow paths are curved, because the network sees many different $(z_0, x_1)$ pairs and learns to average — routing all particles simultaneously.

Why Does This Simple Loss Work?

A remarkable theoretical result (Lipman et al., 2023)

The conditional flow-matching loss has the same gradients as the intractable marginal loss.

Marginal FM Loss (intractable)

$\mathbb{E}_t\!\left[\lVert v_\theta(t,z) - u_t(z)\rVert^2\right]$

Requires knowing the true marginal velocity $u_t(z)$ — which depends on the entire data distribution.

Conditional FM Loss (tractable)

$\mathbb{E}_{t, z_0, x_1}\!\left[\lVert v_\theta(t,z(t)) - (x_1 - z_0)\rVert^2\right]$

Only needs pairs $(z_0, x_1)$ and a random $t$. Easy to compute!

Same gradients $\Rightarrow$ same optimal $v_\theta$. We get the full flow for free from pointwise supervision.

This is why flow matching is so popular: simple loss, no ODE solve during training, theoretically sound.

Straight Training, Curved Inference

Why does training on straight lines produce curved paths at inference?

At each point $(t, z)$, multiple training pairs pass through with different straight-line velocities.

The network learns their average — which differs from any single training target.

Integrating these averaged velocities step-by-step produces a curved inference path.

Standard Flow Matching vs. floq

Same framework, very different application

Standard Flow Matching

Goal: generate images, molecules, ...

Dimension: high-dimensional $z \in \mathbb{R}^d$

Noise: $z_0 \sim \mathcal{N}(0, I)$

Target: data samples $x_1 \sim p_{\text{data}}$

Trained by: maximum likelihood

floq (Flow Q-Functions)

Goal: predict Q-values

Dimension: scalar $z \in \mathbb{R}$

Noise: $z_0 \sim \text{Unif}[l, u]$ (wider)

Target: TD target $y(s,a) = r + \gamma Q^{\text{old}}$

Trained by: TD-learning loss

floq borrows the iterative computation structure of flows, not the generative modeling goal.

The point is test-time compute scaling, not distribution modeling.