A fleet that watched itself work

produced by a Noogram agent fleet · noogram.org

ENS · CSD coding-agents seminar · 2026-06-18

Emmanuel Sérié

Applied mathematics, then quantitative research — physics PhD, then years inside quantitative research.

Teaching and research today — École Polytechnique, CMAP, the MaQI master.

One question today: can a fleet of agents do real mathematics — and catch itself when wrong?

École Polytechnique CMAP CNRS

Noogram — an instrument built in the open

Noogram

A research programme, not a product — federative agentic AI, human-amplified, never replaced.

Live today — the public core is shipping, not promised.

One instrument, many crafts

commun · noyau · cliquet at the center — every spoke is a different craft run on the same open core. The green dots are live public artifacts: click one to open it.

Code → cosmon.noogram.dev Code dev · revue cosmon.noogram.dev ↗ Maths / Physique théorique Maths / Physique théorique modèles · preuves flow-matching.noogram-labs.devflow-matching.noogram-labs.dev ↗ exp-families.noogram-labs.devexp-families.noogram-labs.dev ↗ Calcul scientifique → oxymake.noogram.dev Calcul scientifique orchestration oxymake.noogram.dev ↗ Cyber — artefact public bientôt Cyber chiffrement bientôt Juridique — artefact public bientôt Juridique NDA · contrats bientôt École d'été IA → qfa-surface.pages.dev École d'été IA co-organisation qfa-surface.pages.dev ↗ Musique — artefact public bientôt Musique composition bientôt Scaena → scaena-lea-london.pages.dev Scaena scénographie scaena-lea-london.pages.dev ↗ Éducation → morpion-play.pages.dev Éducation pédagogie morpion-play.pages.dev ↗ PME — CEO → addl-bopp.pages.dev PME — CEO gouvernance addl-bopp.pages.dev ↗ CTO fractionnel — artefact public bientôt CTO fractionnel architecture bientôt Santé / MedTech → radience.noogram.dev Santé / MedTech imagerie médicale radience.noogram.dev ↗ commun noyau · cliquet

Cosmon — the engine, under your control

Four principles, formally pinned — self-reference, transport-and-cognition, intentions-not-ownership, minimum-action; TLA+-validated.

Built by the method it runs — agents propose, a chief decides.

commits: 10.7k tests: 6.8k Rust: ~490k lines agent co-authored: 1.6k

La Formule 1, pas le pilote — one mission it just ran.

The question, verbatim

One line in a to-do file — a researcher dropped a half-formed maths question into todo.md: flow matching versus optimal transport, on Gaussians.

We changed nothing — that line, untouched, is everything the fleet was given.

Not one agent — a fleet

One chatbot is one researcher — fast, fluent, alone in the room with its own mistakes.

This is a research group — many agents split into branches, with a referee built in: a sub-fleet whose only job is to attack the others' work.

Who does what — the role graph

Three branches, one referee — a question in, the artifacts out; the red-team role exists only to attack the others' work. zoom / more graphs ↗

The plan lays itself down

17 tasks, written in one breath — the opening science plan, before a single worker moves. Four branches: notebooks, paper, proof, self-report.

It rewrites itself mid-run

It found its own mistake and rewrote the plan — 5 repair tasks it never planned punch in, wired backward into a branch it thought was done. zoom / more graphs ↗

The result, stated formally

Schedule-independent; commutation = symmetry = the Bures–Wasserstein OT map:

\[ [\Sigma_0,\Sigma_1]=0 \iff T_1=T_1^{\top} \iff T_1=T_{\mathrm{OT}} \iff T_1=\Sigma_1^{1/2}\Sigma_0^{-1/2} \]

It is machine-checked

No charisma gets past the kernel — a machine read the proof and agreed. This is the part that does not get to bluff.

lake build: exit 0 sorries: 0 project axioms: 0 Lean: v4.29

The verifier caught us — we fixed it for real

A draft of the hard direction was wrong — an invalid step, plus a gap in dimension three and up. The adversarial sub-fleet flagged both.

We closed it, unconditionally — on the straight-line schedule a closed form forces commutation. No prose was patched.

See it yourself

flow-matching-gaussians.pages.dev — the proof, the notebooks, the paper: all live, all generated by the fleet.

The trust is not in the agents — it is in the gate they could not argue with.

adversarial corpus: 9 / 9 sweep: 9000 pairs pass paper · 2 notebooks · Lean proof

Appendix — backup slides

Everything past this point is past the end — reachable during Q&A with the arrow keys or slide overview.

A · The converse — why symmetry forces commutation

Orthogonal factorisation — the flow map splits as \(\Phi_t=\Sigma_t^{1/2}O_t\Sigma_0^{-1/2}\) with \(O_t\in SO(d)\) and \(O_0=I\).

Time-ordering is the obstruction — \(T_1\) is symmetric iff the residual rotation \(O_1=I\); off the commuting locus the time-ordered (Magnus) exponential leaves a real rotation behind.

On the affine schedule it is algebraic — \(T_1=(\Sigma_1\Sigma_0^{-1})^{1/2}\), so symmetry forces \(\Sigma_1\Sigma_0^{-1}\) symmetric, i.e. \([\Sigma_0,\Sigma_1]=0\).

B · The Lean formalisation

Mathlib, kernel-checked — the commuting case is a machine-verified anchor; the shipped source blob matches kernel-provenance.log.

Author is not the scorer — a nine-category adversarial corpus probes the proof: eight must be rejected, one must build.

lake build: exit 0 sorries: 0 project axioms: 0 corpus: 9 / 9

C · Three schedules, one map

Linear, variance-preserving, cosine — three interpolation schedules \((a,b)\) between the same endpoints.

The map does not care — Proposition S: \(T_1\) is identical across all three (zero curvature of the connection \(\omega=\tfrac12\,d\Sigma\,\Sigma^{-1}\)).

3-Dirac mixture vs OT — the notebooks plot flow-matching trajectories against the Bures geodesic; they diverge exactly off the commuting locus.

D · The Bures–Wasserstein OT map

OT between Gaussians is closed-form — the \(W_2\)-optimal map is \(T_{\mathrm{OT}}=\Sigma_0^{-1/2}\bigl(\Sigma_0^{1/2}\Sigma_1\Sigma_0^{1/2}\bigr)^{1/2}\Sigma_0^{-1/2}\).

It is the unique symmetric PD solution — of the Bures equation \(T\,\Sigma_0\,T^{\top}=\Sigma_1\).

That is why symmetry = OT — symmetry of \(T_1\) singles out exactly this map.

E · How a mission actually runs

A DAG of molecules — the fleet writes its own task graph, workers tackle the nodes, and a referee sub-fleet attacks the output.

Formally pinned — TLA+ proves no worker advances alone and a chief always decides.

The cost — this proof ran as 59 tasks across 78 worker sessions over hours, all reproducible from the recorded event log.

F · What it cost

\(\approx\) $596 \((\approx\) €549) — \(\approx\) 548M tokens, 5,309 turns, 78 worker sessions on claude-opus-4-8, summed from transcript usage at published rates. Units: a task is one molecule; a worker session is one Claude Code session (more than tasks, since sub-agents add sessions). The science DAG was 22 tasks (17 plan + 5 repairs); the whole run was 59 tasks across 4 polymerisations.

95% is cache reads — caching turned a \(\approx\) $2,597 input bill into $260, a \(\approx\) $2,337 saving.

Honest sourcescs ensemble prints live cost, events.jsonl logs transitions; costs.csv is a bytes proxy, transcripts are the truth.

G · Sister work — the same engine, other problems

One instrument, many missions — this proof is one of several federated works on the shared Noogram core; each a different problem, the same fleet.

exp-families-stability — Gaussian-convolution-stable exponential families: same notebooks + paper + Lean model as this galaxy, a sibling proof on the same rails.

The rest is on the map — the constellation links every live cousin; noogram.org is the index.