Skip to content

Commit

Permalink
formal2formal
Browse files Browse the repository at this point in the history
  • Loading branch information
wellecks committed Oct 11, 2023
0 parents commit adf9240
Show file tree
Hide file tree
Showing 23 changed files with 22,606 additions and 0 deletions.
110 changes: 110 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@

*.pyc
*.tar.gz
# JetBrains PyCharm IDE
.idea/

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# macOS dir files
.DS_Store

# Distribution / packaging
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# dotenv
.env

# virtualenv
.venv
venv/
ENV/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
57 changes: 57 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# `llemma` formal2formal

Scripts for the Lean formal2formal (tactic prediction) experiments in \
[Llemma: an open language model for mathematics [Azerbayev et al 2023]]()

#### Setup
Install Python packages:
```
pip install -r requirements.txt
```

Install Lean:
```
# from https://leanprover-community.github.io/install/linux.html
# After running this command, select (2), then `nightly`, then `y`:
curl https://raw.githubusercontent.com/leanprover/elan/master/elan-init.sh -sSf | sh
source $HOME/.elan/env
lake
```

Configure LeanDojo:
```
export CONTAINER="native"
```

#### Run
See `scripts`

#### Compute metrics

```bash
python compute_metrics.py
==>

codellama7b_minif2f_test 0.20491803278688525 50 244
codellama34b_minif2f_test 0.22131147540983606 54 244
llemma7b_minif2f_test 0.26229508196721313 64 244
llemma34b_minif2f_test 0.2581967213114754 63 244
```



### Troubleshooting
- We observe a Ray error when running the 34b script (with VLLM `--tp-degree > 1`) on an untraced LeanDojo repo. A workaround is to run the 7b script with `--tp-degree 1` such that LeanDojo completes tracing the repo. Then run the 34b script with `--tp-degree > 1`.

### Citation
Please cite the following:
```
@article{azerbayev2023llemma,
title={Llemma: an open language model for mathematics},
author={Zhangir Azerbayev and Hailey Schoelkopf and Keiran Paster and Marco Dos Santos and Stephen McAleer and Albert Q. Jiang and Jia Deng and Stella Biderman and Sean Welleck},
eprint={xyz.xyz},
archivePrefix={arXiv}
year={2023}
}
```

26 changes: 26 additions & 0 deletions compute_metrics.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
import json
import glob

for setting in [
'codellama7b_minif2f_test',
'codellama34b_minif2f_test',
'llemma7b_minif2f_test',
'llemma34b_minif2f_test'
]:
fs = [json.load(open(x)) for x in glob.glob('./output/%s/*.json' % setting)]
n = 0
ns = 0
for f in fs:
for result in f['results']:
name = result['example']['full_name']

# Extra helper theorem in the OpenAI code
if 'sum_pairs' in name:
continue

n += 1
if result['success']:
ns += 1

print(setting, ns/n, ns, n, sep='\t')

488 changes: 488 additions & 0 deletions data/minif2f.jsonl

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions output/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Please note:
- Llemma34b was run with three shards: 1 shard for the first 122 theorems, and two for the remaining.
Loading

0 comments on commit adf9240

Please sign in to comment.