TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts

Wang, Ruida; Zhang, Jipeng; Jia, Yizhen; Pan, Rui; Diao, Shizhe; Pi, Renjie; Zhang, Tong

Computer Science > Formal Languages and Automata Theory

arXiv:2407.03203 (cs)

[Submitted on 3 Jul 2024 (v1), last revised 4 Oct 2024 (this version, v2)]

Title:TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts

Authors:Ruida Wang, Jipeng Zhang, Yizhen Jia, Rui Pan, Shizhe Diao, Renjie Pi, Tong Zhang

View PDF HTML (experimental)

Abstract:Proving mathematical theorems using computer-verifiable formal languages like Lean significantly impacts mathematical reasoning. One approach to formal theorem proving involves generating complete proofs using Large Language Models (LLMs) based on Natural Language (NL) proofs. However, due to the scarcity of aligned NL and Formal Language (FL) theorem-proving data most modern LLMs exhibit suboptimal this http URL scarcity results in a paucity of methodologies for training LLMs and techniques to fully utilize their capabilities in composing formal proofs. To address these challenges, this paper proposes TheoremLlama, an end-to-end framework that trains a general-purpose LLM to be a Lean4 expert. TheoremLlama includes NL-FL dataset generation and bootstrapping method to obtain aligned dataset, curriculum learning and block training techniques to train the model, and iterative proof writing method to write Lean4 proofs that work together synergistically. Using the dataset generation method in TheoremLlama, we provide Open Bootstrapped Theorems (OBT), an NL-FL aligned and bootstrapped dataset. Our novel NL-FL bootstrapping method, where NL proofs are integrated into Lean4 code for training datasets, leverages the NL reasoning ability of LLMs for formal reasoning. The TheoremLlama framework achieves cumulative accuracies of 36.48% and 33.61% on MiniF2F-Valid and Test datasets respectively, surpassing the GPT-4 baseline of 22.95% and 25.41%. Our code, model checkpoints, and the generated dataset is published in GitHub

Subjects:	Formal Languages and Automata Theory (cs.FL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2407.03203 [cs.FL]
	(or arXiv:2407.03203v2 [cs.FL] for this version)
	https://doi.org/10.48550/arXiv.2407.03203

Submission history

From: Ruida Wang [view email]
[v1] Wed, 3 Jul 2024 15:36:18 UTC (2,662 KB)
[v2] Fri, 4 Oct 2024 03:06:26 UTC (3,146 KB)

Computer Science > Formal Languages and Automata Theory

Title:TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Formal Languages and Automata Theory

Title:TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators