produced by a Noogram agent fleet · noogram.org
ENS · CSD coding-agents seminar · 2026-06-18
Applied mathematics, then quantitative research — physics PhD, then years inside quantitative research.
Teaching and research today — École Polytechnique, CMAP, the MaQI master.
One question today: can a fleet of agents do real mathematics — and catch itself when wrong?
A research programme, not a product — federative agentic AI, human-amplified, never replaced.
Live today — the public core is shipping, not promised.
commun · noyau · cliquet at the center — every spoke is a different craft run on the same open core. The green dots are live public artifacts: click one to open it.
Four principles, formally pinned — self-reference, transport-and-cognition, intentions-not-ownership, minimum-action; TLA+-validated.
Built by the method it runs — agents propose, a chief decides.
La Formule 1, pas le pilote — one mission it just ran.
One line in a to-do file — a researcher dropped a half-formed maths question into todo.md: flow matching versus optimal transport, on Gaussians.
We changed nothing — that line, untouched, is everything the fleet was given.
One chatbot is one researcher — fast, fluent, alone in the room with its own mistakes.
This is a research group — many agents split into branches, with a referee built in: a sub-fleet whose only job is to attack the others' work.
Three branches, one referee — a question in, the artifacts out; the red-team role exists only to attack the others' work. zoom / more graphs ↗
17 tasks, written in one breath — the opening science plan, before a single worker moves. Four branches: notebooks, paper, proof, self-report.
It found its own mistake and rewrote the plan — 5 repair tasks it never planned punch in, wired backward into a branch it thought was done. zoom / more graphs ↗
Schedule-independent; commutation = symmetry = the Bures–Wasserstein OT map:
No charisma gets past the kernel — a machine read the proof and agreed. This is the part that does not get to bluff.
A draft of the hard direction was wrong — an invalid step, plus a gap in dimension three and up. The adversarial sub-fleet flagged both.
We closed it, unconditionally — on the straight-line schedule a closed form forces commutation. No prose was patched.
flow-matching-gaussians.pages.dev — the proof, the notebooks, the paper: all live, all generated by the fleet.
The trust is not in the agents — it is in the gate they could not argue with.
Everything past this point is past the end — reachable during Q&A with the arrow keys or slide overview.
Orthogonal factorisation — the flow map splits as \(\Phi_t=\Sigma_t^{1/2}O_t\Sigma_0^{-1/2}\) with \(O_t\in SO(d)\) and \(O_0=I\).
Time-ordering is the obstruction — \(T_1\) is symmetric iff the residual rotation \(O_1=I\); off the commuting locus the time-ordered (Magnus) exponential leaves a real rotation behind.
On the affine schedule it is algebraic — \(T_1=(\Sigma_1\Sigma_0^{-1})^{1/2}\), so symmetry forces \(\Sigma_1\Sigma_0^{-1}\) symmetric, i.e. \([\Sigma_0,\Sigma_1]=0\).
Mathlib, kernel-checked — the commuting case is a machine-verified anchor; the shipped source blob matches kernel-provenance.log.
Author is not the scorer — a nine-category adversarial corpus probes the proof: eight must be rejected, one must build.
Linear, variance-preserving, cosine — three interpolation schedules \((a,b)\) between the same endpoints.
The map does not care — Proposition S: \(T_1\) is identical across all three (zero curvature of the connection \(\omega=\tfrac12\,d\Sigma\,\Sigma^{-1}\)).
3-Dirac mixture vs OT — the notebooks plot flow-matching trajectories against the Bures geodesic; they diverge exactly off the commuting locus.
OT between Gaussians is closed-form — the \(W_2\)-optimal map is \(T_{\mathrm{OT}}=\Sigma_0^{-1/2}\bigl(\Sigma_0^{1/2}\Sigma_1\Sigma_0^{1/2}\bigr)^{1/2}\Sigma_0^{-1/2}\).
It is the unique symmetric PD solution — of the Bures equation \(T\,\Sigma_0\,T^{\top}=\Sigma_1\).
That is why symmetry = OT — symmetry of \(T_1\) singles out exactly this map.
A DAG of molecules — the fleet writes its own task graph, workers tackle the nodes, and a referee sub-fleet attacks the output.
Formally pinned — TLA+ proves no worker advances alone and a chief always decides.
The cost — this proof ran as 59 tasks across 78 worker sessions over hours, all reproducible from the recorded event log.
\(\approx\) $596 \((\approx\) €549) — \(\approx\) 548M tokens, 5,309 turns, 78 worker sessions on claude-opus-4-8, summed from transcript usage at published rates. Units: a task is one molecule; a worker session is one Claude Code session (more than tasks, since sub-agents add sessions). The science DAG was 22 tasks (17 plan + 5 repairs); the whole run was 59 tasks across 4 polymerisations.
95% is cache reads — caching turned a \(\approx\) $2,597 input bill into $260, a \(\approx\) $2,337 saving.
Honest sources — cs ensemble prints live cost, events.jsonl logs transitions; costs.csv is a bytes proxy, transcripts are the truth.
One instrument, many missions — this proof is one of several federated works on the shared Noogram core; each a different problem, the same fleet.
exp-families-stability — Gaussian-convolution-stable exponential families: same notebooks + paper + Lean model as this galaxy, a sibling proof on the same rails.
The rest is on the map — the constellation links every live cousin; noogram.org is the index.