Technical Report: The Network Structure of Mathlib 4

This dashboard reproduces and extends the structural analysis of Lean 4's Mathlib presented in Li et al. (2026), "The Network Structure of Mathlib" (arXiv:2604.24797v1). The paper measures Mathlib at commit 534cf0b (Feb 2026, 308 129 declarations, 8.4 M edges, 7 563 modules) using importGraph, jixia, and lean-training-data.

We rebuild the same family of dependency graphs against Corpus V2 of Mathlib plus nine community projects pinned to Lean 4.29.0, extracted with the typed-edge tool aurasoph/lean-graph. Because that extractor distinguishes six edge kinds (proof, sig, def, field, docref, extends), our graph captures structural dependencies the paper's pipeline aggregates out, and surfaces a substantially richer picture of cross-module / cross-namespace flow.

1. Comparative Scale

Both runs use a single static snapshot. Ours is later, larger, and tracks more edge types — so all deltas should be read as a combined growth + extraction-fidelity signal rather than pure library growth.

The 38% leaf-theorem rate falls below the paper's 44.8% — a side effect of richer edge extraction: every additional sig/field edge we record can lift a theorem out of the zero-in-degree bucket. The same mechanism pushes our unique edges / declaration ratio from 27 (paper) to 33.

2. Edge Type Composition

lean-graph decomposes every dependency into one of six edge kinds. The paper's declaration graph collapses these into a single edge per (source, target) pair, losing the layering. We keep them, which lets us track how much of the graph is driven by each Lean mechanism.

proof (proof-term references) accounts for 52% of typed edges; sig (statement-level type references) accounts for 35%. The paper reports a synthesis ratio σ = 74.2% — the share of edges inserted automatically by the elaborator. We do not separate synthesized from explicit at this layer; instead we expose the mechanism (sig, def, field, extends) so the ratio can be recomputed by any combination.

2a. Source–target pair classification

When duplicate (source, target) rows are merged across edge types, every pair falls into one of three buckets — exactly the paper's Statement only / Proof only / Both decomposition (§B.2.2).

The paper reports an 8.1 / 43.9 / 48.0 split for Statement / Proof / Both. We see 33% / 41% / 26%. The extra mass on Statement only is structural overhead our sig/field/extends extraction makes visible: a definition's signature drags in many typeclasses that the elaborator never needs to revisit during proof checking.

3. Declaration Kind Composition

Every node carries a Lean kind. The paper enumerates eight (Table 10); our extractor also records instance and class distinctly because they have different structural roles even though Lean treats them as ordinary definitions.

Theorems are 69% of declarations (paper: 79%). The gap shows up almost entirely as instance (10%) and class (0.6%) — categories the paper folds back into definition. Each instance is an automatically resolved structural witness; treating them as a separate kind is what lets us reproduce the paper's "language infrastructure vs. mathematical content" hub split (§C.2.2) directly from the kind column.

3a. Per-kind degree

The paper's Table 12 finding survives in our richer corpus: axioms still sit at the bottom of the dependency chain (mean in-degree ≈ 4 049, mean out-degree ≈ 1.7), theorems at the top of the production chain (mean out-degree ≈ 49, mean in-degree ≈ 8), and inductive types form structural skeletons (mean in-degree ≈ 441 — even higher than the paper's 273, reflecting the additional sig edges we capture). Theorems retain a heavy long tail: 38% are never cited.

4. Inter-Kind Edge Flow

The kind composition is most informative when read as a flow: which kinds produce edges, and which kinds consume them. The matrix below replicates the paper's Table 11.

Theorems → instances dominates (5.8 M edges), reflecting how often proofs cite typeclass witnesses. Theorem → theorem accounts for only 1.8 M — the paper's Definition B.2.4 prediction that "synthesized" infrastructure dwarfs explicit mathematical citation.

5. Degree Distributions

Both the paper (Figure 28) and our corpus exhibit a heavy in-degree tail with a sharply bounded out-degree.

The in-degree maximum in our corpus is edges (vs. the paper's 89 936 for OfNat.ofNat). The gap reflects more aggressively captured synthesized edges around OfNat, Eq, and congrArg, all of which the elaborator inserts implicitly.

5a. Top-cited declarations

OfNat.ofNat, Eq, congrArg, Membership.mem, DFunLike.coe — the same top-of-list the paper identifies (Table 18). Ranks 1–10 are saturated by language infrastructure, exactly the "language infrastructure vs. mathematical content" two-layer hub structure the paper documents.

5b. Highest out-degree (longest proofs)

Out-degree maxes out at in our snapshot. The paper records 522 — the gap is again the extra edge types we attach to each declaration. The out-degree leaderboard is dominated by combinatorial and analysis files where a single proof unfolds many premises.

6. Boundary Classification

The paper measures (§B.1.6) that 37.1% of module-level edges are inter-directory and (§C.2.2 Table 14) that 50.9% of declaration-level edges are cross-namespace. We measure both, using the first dot-separated segment of each declaration's name as its top-level namespace.

In our corpus, 78% of edges cross a top-level namespace — well above the paper's 50.9%. Two effects compound: (1) deeper extraction surfaces more cross-namespace sig/field edges that the paper's pipeline misses; (2) Mathlib has continued to absorb subdisciplines (e.g., RingTheoryAlgebra), inflating the cross-namespace density. Same-module edges drop to 8% (paper: 9.6%).

6a. Heaviest cross-module flows

7. Strongly Connected Structure

Mathlib's declaration graph is almost a DAG, but the moment we keep sig, field, docref, and extends edges, foundational algebra closes a cycle:

Outside the giant 2 980-node "ring core," nearly all SCCs are size 2–8, primarily field ↔ sig pairs introduced by structure projections. The largest non-trivial SCCs beyond the ring core are 317, 114, and 78 nodes — small islands of mutually recursive definitions in category theory and order theory.

7a. The ring core (top 250 hubs)

Edge colour: red = extends, purple = field, blue = sig, grey = mixed/proof. Node radius = degree within the SCC. Drag a node to pin it; scroll to zoom.

8. Module-Level Hubs

Aggregating declaration-level edges to their defining modules reproduces the paper's "infrastructure vs. integrator" axis (Table 6, Figure 26). High in-edge modules are infrastructure; high out-edge modules are integrators.

Init.Prelude, Mathlib.Algebra.Group.Defs, and Mathlib.Algebra.Ring.Defs are the top three by in-edges — exactly the paper's top-3 module ranking transcribed via declarations rather than imports. The points high on the y-axis with low x-axis are foundational; the points along the diagonal are workhorse algebra/topology files that both depend on much and are depended on by much.

9. Import Utilization & the "Wasteland"

Median scope utilization: 0.17%. The paper reports an import-edge utilization median of 1.6% (§B.2.11) — a strictly easier metric, since it is computed per direct import edge rather than per transitive scope. Both metrics agree on the qualitative finding: the typical Mathlib file pulls in two-to-three orders of magnitude more declarations than its proofs ever touch.

10. Zero-Citation Namespaces

Declarations the rest of the corpus never cites form Mathlib's "outer surface." Following Table 13, we group theorems by their first two name components and rank by zero-citation rate.

The paper's pattern repeats: namespaces produced by @[to_additive] mirroring or by generic data-structure libraries (Std.HashSet, Std.TreeSet, Std.ExtTreeSet) are overwhelmingly leaves, while concrete-mathematics namespaces (Std.Internal, Profinite.NobelingProof, Topology.IsInducing) are nearly fully consumed.

11. Community Project Footprint

We measured nine community projects pinned to the same Mathlib 4.29 toolchain. Each project is treated as an independent corpus that cites into Mathlib through cross-project edges.

combinatorial-games (2.9k declarations, 92k cross-edges into Mathlib) and apap (786 / 58k) lead the field. The ratio of cross-edges to own-declarations is the "Mathlib gravity" of each project: how many premises a single new declaration typically cites in Mathlib. For cam-combi it is ~48; for combinatorial-games it is ~32; the Mathlib ecosystem behaves as a dense citation pool, not a thin prelude.

12. Replication Checklist

Paper claim Paper value This corpus
Nodes (§B.2.1) 308 129
Modules (§B.1.1) 7 563
Unique edges 8.44 M M
Theorems share of D (Table 10) 79.1% %
Leaf-theorem rate (§C.2.2) 44.8% %
Cross-namespace edge share (Table 14) 50.9% %
Same-module edge share (Table 14) 9.6% %
Largest non-trivial SCC (paper: DAG by construction) nodes
Top-1 in-degree node OfNat.ofNat (89 936) ()
Top-3 in-degree (semantic) CategoryTheory.Category, Real, TopologicalSpace

Where the paper measures only the merged declaration graph (one edge per pair, no type label), our typed-edge extraction lets the same metrics be re-derived per edge kind. Every chart on this page is rebuilt from build_data.py, which queries the v2 corpus DB at formalized_graph_v2/data/generated/corpus_v2_mathlib_plus_v4.29.db.

Data provenance

Every chart on this page is generated from a single corpus database via build_data.py. The manifest below records which DB produced this build, when, and which projects were ingested.

To rebuild against a different corpus:

make data DB=/path/to/other_corpus.db TAG=v4.28-2026-06-01
make site
make deploy   # ships to Cloudflare Pages

Citations & Acknowledgments