Technical Report: The Network Structure of Mathlib 4
This dashboard reproduces and extends the structural analysis of Lean 4's Mathlib
presented in Li et al. (2026), "The Network Structure of Mathlib"
(arXiv:2604.24797v1). The paper measures Mathlib at commit 534cf0b (Feb 2026,
308 129 declarations, 8.4 M edges, 7 563 modules) using importGraph, jixia, and
lean-training-data.
We rebuild the same family of dependency graphs against Corpus V2 of Mathlib
plus nine community projects pinned to Lean 4.29.0, extracted with the typed-edge
tool aurasoph/lean-graph. Because that
extractor distinguishes six edge kinds (proof, sig, def, field, docref,
extends), our graph captures structural dependencies the paper's pipeline aggregates
out, and surfaces a substantially richer picture of cross-module / cross-namespace
flow.
1. Comparative Scale
Both runs use a single static snapshot. Ours is later, larger, and tracks more edge types — so all deltas should be read as a combined growth + extraction-fidelity signal rather than pure library growth.
The 38% leaf-theorem rate falls below the paper's 44.8% — a side effect of richer
edge extraction: every additional sig/field edge we record can lift a theorem
out of the zero-in-degree bucket. The same mechanism pushes our unique edges /
declaration ratio from 27 (paper) to 33.
2. Edge Type Composition
lean-graph decomposes every dependency into one of six edge kinds. The paper's
declaration graph collapses these into a single edge per (source, target) pair,
losing the layering. We keep them, which lets us track how much of the graph is
driven by each Lean mechanism.
proof (proof-term references) accounts for 52% of typed edges; sig
(statement-level type references) accounts for 35%. The paper reports a
synthesis ratio σ = 74.2% — the share of edges inserted automatically by the
elaborator. We do not separate synthesized from explicit at this layer; instead we
expose the mechanism (sig, def, field, extends) so the ratio can be
recomputed by any combination.
2a. Source–target pair classification
When duplicate (source, target) rows are merged across edge types, every pair falls
into one of three buckets — exactly the paper's Statement only / Proof only / Both
decomposition (§B.2.2).
The paper reports an 8.1 / 43.9 / 48.0 split for Statement / Proof / Both. We see
33% / 41% / 26%. The extra mass on Statement only is structural overhead our
sig/field/extends extraction makes visible: a definition's signature drags in many
typeclasses that the elaborator never needs to revisit during proof checking.
3. Declaration Kind Composition
Every node carries a Lean kind. The paper enumerates eight (Table 10); our extractor
also records instance and class distinctly because they have different structural
roles even though Lean treats them as ordinary definitions.
Theorems are 69% of declarations (paper: 79%). The gap shows up almost entirely as
instance (10%) and class (0.6%) — categories the paper folds back into
definition. Each instance is an automatically resolved structural witness; treating
them as a separate kind is what lets us reproduce the paper's "language infrastructure
vs. mathematical content" hub split (§C.2.2) directly from the kind column.
3a. Per-kind degree
The paper's Table 12 finding survives in our richer corpus: axioms still sit at
the bottom of the dependency chain (mean in-degree ≈ 4 049, mean out-degree ≈ 1.7),
theorems at the top of the production chain (mean out-degree ≈ 49, mean
in-degree ≈ 8), and inductive types form structural skeletons (mean in-degree
≈ 441 — even higher than the paper's 273, reflecting the additional sig edges we
capture). Theorems retain a heavy long tail: 38% are never cited.
4. Inter-Kind Edge Flow
The kind composition is most informative when read as a flow: which kinds produce edges, and which kinds consume them. The matrix below replicates the paper's Table 11.
Theorems → instances dominates (5.8 M edges), reflecting how often proofs cite typeclass witnesses. Theorem → theorem accounts for only 1.8 M — the paper's Definition B.2.4 prediction that "synthesized" infrastructure dwarfs explicit mathematical citation.
5. Degree Distributions
Both the paper (Figure 28) and our corpus exhibit a heavy in-degree tail with a sharply bounded out-degree.
The in-degree maximum in our corpus is OfNat.ofNat). The gap reflects more aggressively captured
synthesized edges around OfNat, Eq, and congrArg, all of which the elaborator
inserts implicitly.
5a. Top-cited declarations
OfNat.ofNat, Eq, congrArg, Membership.mem, DFunLike.coe — the same
top-of-list the paper identifies (Table 18). Ranks 1–10 are saturated by language
infrastructure, exactly the "language infrastructure vs. mathematical content"
two-layer hub structure the paper documents.
5b. Highest out-degree (longest proofs)
Out-degree maxes out at
6. Boundary Classification
The paper measures (§B.1.6) that 37.1% of module-level edges are inter-directory and (§C.2.2 Table 14) that 50.9% of declaration-level edges are cross-namespace. We measure both, using the first dot-separated segment of each declaration's name as its top-level namespace.
In our corpus, 78% of edges cross a top-level namespace — well above the paper's
50.9%. Two effects compound: (1) deeper extraction surfaces more cross-namespace
sig/field edges that the paper's pipeline misses; (2) Mathlib has continued to
absorb subdisciplines (e.g., RingTheory → Algebra), inflating the cross-namespace
density. Same-module edges drop to 8% (paper: 9.6%).
6a. Heaviest cross-module flows
7. Strongly Connected Structure
Mathlib's declaration graph is almost a DAG, but the moment we keep sig,
field, docref, and extends edges, foundational algebra closes a cycle:
Outside the giant 2 980-node "ring core," nearly all SCCs are size 2–8, primarily field ↔ sig pairs introduced by structure projections. The largest non-trivial SCCs beyond the ring core are 317, 114, and 78 nodes — small islands of mutually recursive definitions in category theory and order theory.
7a. The ring core (top 250 hubs)
Edge colour: red = extends, purple = field, blue = sig, grey = mixed/proof.
Node radius = degree within the SCC. Drag a node to pin it; scroll to zoom.
8. Module-Level Hubs
Aggregating declaration-level edges to their defining modules reproduces the paper's "infrastructure vs. integrator" axis (Table 6, Figure 26). High in-edge modules are infrastructure; high out-edge modules are integrators.
Init.Prelude, Mathlib.Algebra.Group.Defs, and Mathlib.Algebra.Ring.Defs are the
top three by in-edges — exactly the paper's top-3 module ranking transcribed via
declarations rather than imports. The points high on the y-axis with low x-axis are
foundational; the points along the diagonal are workhorse algebra/topology files
that both depend on much and are depended on by much.
9. Import Utilization & the "Wasteland"
Median scope utilization: 0.17%. The paper reports an import-edge utilization median of 1.6% (§B.2.11) — a strictly easier metric, since it is computed per direct import edge rather than per transitive scope. Both metrics agree on the qualitative finding: the typical Mathlib file pulls in two-to-three orders of magnitude more declarations than its proofs ever touch.
10. Zero-Citation Namespaces
Declarations the rest of the corpus never cites form Mathlib's "outer surface." Following Table 13, we group theorems by their first two name components and rank by zero-citation rate.
The paper's pattern repeats: namespaces produced by @[to_additive] mirroring or by
generic data-structure libraries (Std.HashSet, Std.TreeSet, Std.ExtTreeSet) are
overwhelmingly leaves, while concrete-mathematics namespaces (Std.Internal,
Profinite.NobelingProof, Topology.IsInducing) are nearly fully consumed.
11. Community Project Footprint
We measured nine community projects pinned to the same Mathlib 4.29 toolchain. Each project is treated as an independent corpus that cites into Mathlib through cross-project edges.
combinatorial-games (2.9k declarations, 92k cross-edges into Mathlib) and apap
(786 / 58k) lead the field. The ratio of cross-edges to own-declarations is the
"Mathlib gravity" of each project: how many premises a single new declaration
typically cites in Mathlib. For cam-combi it is ~48; for combinatorial-games
it is ~32; the Mathlib ecosystem behaves as a dense citation pool, not a thin
prelude.
12. Replication Checklist
| Paper claim | Paper value | This corpus |
|---|---|---|
| Nodes (§B.2.1) | 308 129 | |
| Modules (§B.1.1) | 7 563 | |
| Unique edges | 8.44 M | |
| Theorems share of D (Table 10) | 79.1% | |
| Leaf-theorem rate (§C.2.2) | 44.8% | |
| Cross-namespace edge share (Table 14) | 50.9% | |
| Same-module edge share (Table 14) | 9.6% | |
| Largest non-trivial SCC | (paper: DAG by construction) | |
| Top-1 in-degree node | OfNat.ofNat (89 936) | |
| Top-3 in-degree (semantic) | CategoryTheory.Category, Real, TopologicalSpace |
Where the paper measures only the merged declaration graph (one edge per pair, no
type label), our typed-edge extraction lets the same metrics be re-derived per edge
kind. Every chart on this page is rebuilt from build_data.py, which queries the
v2 corpus DB at
formalized_graph_v2/data/generated/corpus_v2_mathlib_plus_v4.29.db.
Data provenance
Every chart on this page is generated from a single corpus database via
build_data.py. The manifest below records which DB produced this build, when,
and which projects were ingested.
To rebuild against a different corpus:
make data DB=/path/to/other_corpus.db TAG=v4.28-2026-06-01
make site
make deploy # ships to Cloudflare Pages
Citations & Acknowledgments
- Foundational study: Li, Peng, Severini, Shafto (2026), "The Network Structure of Mathlib" (arXiv:2604.24797v1).
- Typed extraction:
aurasoph/lean-graph. - Community projects index: Lean Prover Community Projects.
- Reproduction: rebuild data with
python3 build_data.pyfrom this directory; the corpus DB and the underlying SLURM extraction logs live underformalized_graph_v2/.