Flow matching · optimal transport · a machine-checked dichotomy

When is flow matching the same as optimal transport?

Run a flow-matching ODE from one Gaussian to another. Run optimal transport between the same two Gaussians. The two maps coincide if and only if the covariances commute — and in that case the map is the simple Bures form, independent of the schedule you chose.

\[ T_1 \;=\; T_{\mathrm{OT}} \quad\Longleftrightarrow\quad \Sigma_0\Sigma_1=\Sigma_1\Sigma_0 \]

and in the commuting case $ \, T_1 = \big(\Sigma_1\Sigma_0^{-1}\big)^{1/2} $ — symmetric, equal to the Bures–Wasserstein map, the same for every valid schedule.

This proof's first version was wrong. The fleet's own adversary caught an invalid converse argument and a gap in dimension three and up. We kept the receipt of the fix — and then closed the hard direction for real, unconditionally. Anyone can show a green check; almost no one shows the red they passed through to get there.

The verdict, from the kernel — not from us

Below is the real output the Lean kernel and the audit produce. The machine-checked commuting case carries only the three foundational axioms — no sorry, no project-local axiom standing in for a missing argument. Every number in the strip above is captured from a command at build time; see §7 Reproduce / verify.

# lake build — anchor theorem accepted by the Lean kernel (exit 0) $ lake build Build completed successfully. exit 0 # the source the kernel verified is byte-identical to the shipped source $ git hash-object lean/FMG/Gaussian.lean 760ec9b3afb37978b6acb840a260cff0ae283ea6 ✓ matches kernel-provenance.log # #print axioms — only the three standard foundational axioms $ #print axioms FMG.flowMap_isHermitian_pushforward_eq_otMap_of_commute ⇒ [propext, Classical.choice, Quot.sound] -- no sorryAx, no project-local axiom # adversarial corpus — author ≠ scorer; the kernel's exit code is the verdict $ bash FMG/Adversarial/check-adversarial.sh CORPUS OK — 9 / 9 resolved as the manifest predicts (8 unfaithful variants rejected · 1 axiom-smuggling build caught by the audit)

Source repository ↗

github.com/noogram-labs/flow-matching-gaussians — notebooks, paper (.tex), the Lean project, and the report renderer. Clone it, run it, check the proof yourself.

noogram.org ↗

The fleet that produced this. Every artifact on this page was written and reviewed by a Noogram agent fleet, with a verifier the agents could not argue with.

§1 · The result

For the independent-coupling Gaussian interpolant $ X_t = a(t)X_0 + b(t)X_1 $, the marginal covariance is $ \Sigma_t = a(t)^2\Sigma_0 + b(t)^2\Sigma_1 $, the advection velocity is linear, $ v(x,t)=A_t x $ with $ A_t = \tfrac12\dot\Sigma_t\Sigma_t^{-1} $, and the flow map $ \Phi_t $ it generates pushes $ \mathcal N(0,\Sigma_0) $ to $ \mathcal N(0,\Sigma_t) $. Call its terminal value $ T_1 $. The Bures–Wasserstein optimal-transport map between the same Gaussians is $ T_{\mathrm{OT}}=\Sigma_0^{-1/2}\!\big(\Sigma_0^{1/2}\Sigma_1\Sigma_0^{1/2}\big)^{1/2}\Sigma_0^{-1/2} $.

The dichotomy. $ T_1 = T_{\mathrm{OT}} $ exactly when $ \Sigma_0 $ and $ \Sigma_1 $ commute. In the commuting case the flow map collapses to the schedule-independent Bures form $ T_1=(\Sigma_1\Sigma_0^{-1})^{1/2} $, which is symmetric. When the covariances do not commute, $ T_1 $ is non-symmetric and differs from the (always symmetric) OT map.

The honest boundary, up front

Machine-checked in Lean: the forward, commuting direction — symmetry of $ T_1 $, the Bures pushforward identity, and $ T_1 = T_{\mathrm{OT}} $. lake build exits 0; #print axioms shows only the three foundational axioms.
Informal in Lean, mathematically complete: three companion results (R1 the matrix-ODE ↔ algebraic-map bridge, R2 schedule-independence, R3 the $ d\ge 3 $ converse) live in the paper under the axiom firebreak discipline — honest informal arguments rather than faked Lean terms, because the Mathlib infrastructure they need (time-ordered exponentials, path-ordered holonomy) does not yet exist.
R3 is closed — unconditionally, in every dimension. What looked like the hard direction turned out to hinge on a hidden closed form: on the straight-line schedule the flow's drift matrices commute, time-ordering collapses, and $ T_1=(\Sigma_1\Sigma_0^{-1})^{1/2} $ is algebraic. Symmetry of $ T_1 $ then forces $ \Sigma_1\Sigma_0^{-1} $ symmetric — exactly commutation. No dimension restriction, no conditioning hypothesis, no residual gap. The fleet caught its own gap here and then closed it.

§2 · Notebooks

Two pedagogical PyTorch notebooks, executed end-to-end and rendered to self-contained HTML. The source .ipynb files live in the repo's python/.

A — three-Dirac mixture →

The marginal stays a closed-form Gaussian mixture at every instant. Trajectories coloured by which atom they fall into, three schedules contrasted, flow-matching paths set against semi-discrete optimal transport.

B — covariance ellipses →

The Gaussian-to-Gaussian case: $ \Sigma_t = a^2\Sigma_0 + b^2\Sigma_1 $ drawn as evolving ellipses, commuting vs non-commuting schedules, and the commutator sweep behind the converse.

§3 · The paper

The full write-up — setup, the three-Dirac experiment, the Gaussian section with the theorem and its proof, and the numerical evidence. Download the PDF · the LaTeX source (paper-v1.tex + references.bib) is in the repo.

§4 · The proof — the credibility anchor

The Lean source, not a screenshot. The fidelity anchor is the theorem FMG.flowMap_isHermitian_pushforward_eq_otMap_of_commute, proved basis-free through the continuous functional calculus (so the repeated-eigenvalue cases that sink the naive diagonalisation never arise).

`FMG/Gaussian.lean` →

The anchor theorem and its lemmas, syntax-highlighted. flowMap σ₀ σ₁ is Hermitian, solves the Bures pushforward equation, and equals the OT map.

`FMG/Basic.lean` →

The library's base module — the import surface for FMG.lean.

The firebreak ledger

The disclosed gaps, each a clearly-scoped infrastructure or out-of-scope item — never the main theorem, never an axiom smuggled into the library. Full detail in lean/unproved.md.

Residual	What it is	Status
R1	Matrix-ODE $ \Phi_1 = $ flowMap bridge (needs time-ordered-exponential machinery Mathlib lacks)	informal — infrastructure gap
R2	Schedule-independence / flatness (rides on R1; structurally already true of the Lean object)	informal — out of Lean scope
R3	The $ d\ge 3 $ converse $ T_1 $ symmetric $ \Rightarrow [\Sigma_0,\Sigma_1]=0 $	closed unconditionally — informal in Lean, math complete

The adversary that rejects axiom-smuggling

A green build does not certify a faithful proof if the context is poisoned. The corpus's ninth entry, A9_AxiomSmuggling.lean, builds clean by smuggling a false universal in as an axiom and feeding it to the verified theorem — and is caught not by the exit code but by the #print axioms audit. Author ≠ scorer: the red team writes the variants, the checker grades exit codes against the manifest.

-- A9_AxiomSmuggling.lean (excerpt) — the smuggled false universal axiom A9_commute_everything {n : Type*} [Fintype n] [DecidableEq n] (σ₀ σ₁ : Matrix n n ℝ) : Commute σ₀ σ₁ -- builds with exit 0, but #print axioms A9_smuggled reveals the -- non-standard dependency → the axiom-grep gate fails the build.

Gate	Verdict	Evidence (build-time)
`lake build`	exit 0	anchor accepted; shipped source blob-matches `kernel-provenance.log` (`760ec9b…`)
0-sorry	clean	`grep -rn sorry` over the paper theorem files → no match
no project axiom	clean	only `propext, Classical.choice, Quot.sound`
adversarial corpus	9 / 9	8 rejected + 1 axiom-smuggling build flagged by the audit

§5 · The fleet that watched itself work

Strip away the science and here is the meta-story: a swarm of software agents was given one question and told to deliver a publishable result — and it kept a logbook of its own motion while it worked. The arc is a soap bubble: polymerisation (the opening plan fans out), foaming (children get nucleated mid-flight, the froth swells), drainage (the foam settles, green fills the frame). Every number is pulled from a real field in the fleet's event log; where a number under-counts, the chart says so out loud — and we leave the caveat on.

Two of those views are now interactive and in this page's own charter — hover any role or any bar for detail, or open the full living-document page: the fleet, up close ↗. They are generated straight from the real .cosmon/state/events.jsonl + runtime-trace.jsonl by report/fleet_viz.py.

A · The role graph. Rounded boxes are roles, one colour per sub-fleet (cognition green, formal violet, instrumentation cyan). The left FLEET STATE panel is the shared blackboard; the dashed ENTRY is the mission, the dashed green EXIT the artifacts. The referee — red-team — exists only to attack the others' work; it caught the invalid proof step and the d≥3 gap. Hover any box for its charge.

B · The whole run. One row per molecule; bar length is real tackle→settle duration on a wall-clock axis; ticks are step completions, small squares worker dispatches. Outcome reads off the fill — solid completed, red hatch collapsed (a redundant / junk-probe molecule), amber outline stuck (an operator safety-gate), faint dashed pending (not yet unblocked). The collapsed/stuck/pending story is legible at a glance.

The same arc, told once more in seven static charts — each with its caveat left on:

DAG growth — molecules by state over wall-clock — **1 · Polymerisation.** The dashed line is the total tasks ever created — it climbs in three steps: 17 tasks laid down at once (the plan), a long flat drain, then a second burst of 5 unplanned repair nodes near 19:50. The dynamic DAG rewriting itself while running. Green (completed) rising to fill the frame is drainage.

Molecule Gantt — one bar per task — **2 · The Gantt.** Each bar is one task's whole life, coloured by sub-fleet — cognition (science + words), formal (the Lean proof), instrumentation (this report). The long left bars are deliberations; the short late formal bars are the surgical proof-patches, nucleated late, closed fast.

Frontier width — the foaming signal — **3 · Foaming.** Live foam — queued plus running — spikes to 17 (the whole opening science plan), then ebbs 17 → 12 → 11 → 6, with a bump near 19:45 as the proof-patch children arrive. Caveat printed on the chart itself: the true frontier is ready-vs-blocked, but the event log records only state counts, not dependency edges over time. This draws the measurable proxy (pending + running) and says so. We draw what we can measure and name what we can't.

Cumulative framing-bytes — a transparent cost proxy — **4 · What it cost — ≈ $596 (≈ €549).** The science DAG was 20 molecules, but the whole run was 59 across four polymerisations — and it burned ≈ 548M tokens across 5,309 assistant turns (78 worker sessions: 53 top-level + 25 subagents, all on `claude-opus-4-8`), summed from the per-message `usage` blocks in the session transcripts and priced at Anthropic's published rates: $19.63 input + $139.44 output + $259.69 cache-read + $177.67 cache-write (5-min + 1-hour tiers). The curve above is the *bytes* proxy from `costs.csv`; the dollars come from the transcripts. Three audit trails, named honestly: `cs ensemble` prints live per-worker INPUT/OUTPUT/COST, `events.jsonl` + `runtime-trace.jsonl` log every transition, and `costs.csv` is a sealed-bytes proxy — the transcripts are the real-token source of truth. 94.8% of token volume is cache reads; caching turned a ≈ $2,597 input bill into $260 (≈ $2,337 saved). This snapshot necessarily excludes this reconciliation run's own tokens.

Persona activity — workers spawned per persona — **5 · Who did the work.** Each bar is a named persona and how many workers it spawned — a sourcer, a proofsmith (twice), a skeptic hunting the dangerous wrong-answer, auditors, and a cluster of proof-patchers. Caveat on the chart: cosmon stamps every worker's role field with the same value, so the honest unit is the session name the persona ran under — which is what's plotted. Sub-fleet colour is inferred from each task's briefing, not a stored fact.

Critical path through the final DAG — **6 · The critical path.** The longest chain of strictly-ordered tasks — eleven tasks, roughly four hours end to end, from the opening literature-and-setup, through the deliberation that fixed the theorem's framing, out along the formal proof, into the instrumentation that drew these charts. Everything off the red line had slack; the red line could not.

Outcome mix — completed vs in-flight vs collapsed — **7 · The honest scorecard.** By sub-fleet, nothing hidden: cognition 12/12 completed; formal 4 completed, 1 in-flight, 1 collapsed; instrumentation 1 completed, 3 in-flight (the report was still draining when the snapshot was taken). In the science DAG at this snapshot: **17 tasks completed, 1 collapsed** — and that single collapse was a junk probe created by hand while poking at the system's JSON, recognised and discarded. The log records it faithfully rather than quietly deleting it.

The whole ethic in miniature: the fleet proved what it could prove to the hilt, marked the rest with a steady finger instead of a flourish — then went back and closed the hard one honestly, demoting the nine-thousand-pair numerical sweep from evidence to a sanity check the moment a real proof existed. The trust isn't in the agents — it's in the gate the agents couldn't argue with. The full narrative is in report/report.md.

§6 · The talk — an ENS seminar

The same result, told to a room. A reveal.js deck built to be presented at an ENS seminar — the dichotomy, the proof that was wrong first, and the fleet that watched itself work, in slide form. The deck's closing invitation points back here; this page points back to the deck. Read one object two ways.

Open the slides →

The full reveal.js deck in your browser. Arrow keys to advance, Esc for the slide grid, S for speaker notes.

Slides as PDF →

The same deck exported to a single self-contained slides.pdf — for reading offline or printing the handout.

The constellation map →

An interactive map of the Noogram fleet's public artifacts — verticals wired to the live things they produced. The deck's standalone companion.

§7 · Reproduce / verify

Everything on this page regenerates from the repo. The status strip and the terminal verdict above are produced mechanically by the build script from real command output — not hand-typed.

# clone and check the proof
git clone https://github.com/noogram-labs/flow-matching-gaussians
cd flow-matching-gaussians/lean
lake build                              # exit 0 = the anchor theorem is accepted
bash FMG/Adversarial/check-adversarial.sh   # 9/9 — author ≠ scorer

# regenerate the notebooks, the Lean HTML, and the paper copy
bash docs/site/artifacts/build.sh

# regenerate the seven report figures from the fleet's own event log
python3 report/collect.py               # .cosmon/state → report/data/*.csv (read-only)
python3 report/render.py                # report/data/*.csv → report/figures/*.png
python3 report/fleet_viz.py             # events.jsonl → figures/fleet_*.svg + fleet.html

# rebuild this website (refreshes the status strip from live commands)
bash docs/site/build.sh

Provenance	Value
Built from commit	`dd226f2` (`dd226f258a9db92be28e121349c1160bb3fab774`)
Build timestamp	2026-06-18 (mechanically stamped by `docs/site/build.sh`)
Anchor theorem	`FMG.flowMap_isHermitian_pushforward_eq_otMap_of_commute`
Lean source blob	`760ec9b3afb37978b6acb840a260cff0ae283ea6` — matches kernel-provenance.log
Lean toolchain	`leanprover/lean4:v4.29.0` · Mathlib `v4.29.0`

§8 · References — the closed citable set

The only set the writers may cite. Every identifier was verified against the authoritative record (arXiv abstract or registered DOI) — fabricating a DOI is a blocker fault, and none below are fabricated. Tier and relevance live in source-ledger.md; the machine-readable set is references.bib.

1Building Normalizing Flows with Stochastic Interpolants ↗Michael S. Albergo, Eric Vanden-Eijnden · 2023 · ICLR
2Stochastic Interpolants: A Unifying Framework for Flows and Diffusions ↗Michael S. Albergo, Nicholas M. Boffi, Eric Vanden-Eijnden · 2023
3Flow Matching for Generative Modeling ↗Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, Matt Le · 2023 · ICLR
4Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow ↗Xingchao Liu, Chengyue Gong, Qiang Liu · 2023 · ICLR
5Score-Based Generative Modeling through Stochastic Differential Equations ↗Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon et al. · 2021 · ICLR
6Computational Optimal Transport: With Applications to Data Science ↗Gabriel Peyré, Marco Cuturi · 2019
7A Convexity Principle for Interacting Gases ↗Robert J. McCann · 1997
8On the Bures–Wasserstein distance between positive definite matrices ↗Rajendra Bhatia, Tanvi Jain, Yongdo Lim · 2019
9Wasserstein geometry of Gaussian measures ↗Asuka Takatsu · 2011 · Osaka J. Math.
10Geometry of Matrix Decompositions Seen Through Optimal Transport and Information Geometry ↗Klas Modin · 2017
11Polar Factorization and Monotone Rearrangement of Vector-Valued Functions ↗Yann Brenier · 1991
12The distance between two random vectors with given dispersion matrices ↗Ingram Olkin, Friedrich Pukelsheim · 1982
13Flow Matching Guide and Code ↗Yaron Lipman, Marton Havasi, Peter Holderrieth, Neta Shaul, Matt Le et al. · 2024

Residual	What it is	Status
R1	Matrix-ODE \( \Phi_1 = \) flowMap bridge (needs time-ordered-exponential machinery Mathlib lacks)	informal — infrastructure gap
R2	Schedule-independence / flatness (rides on R1; structurally already true of the Lean object)	informal — out of Lean scope
R3	The \( d\ge 3 \) converse \( T_1 \) symmetric \( \Rightarrow [\Sigma_0,\Sigma_1]=0 \)	closed unconditionally — informal in Lean, math complete