## NaturalProofs: Mathematical Theorem Proving in Natural Language [NaturalProofs: Mathematical Theorem Proving in Natural Language](https://cs.nyu.edu/~welleck/welleck2021naturalproofs.pdf)\ Sean Welleck, Jiacheng Liu, Ronan Le Bras, Hannaneh Hajishirzi, Yejin Choi, Kyunghyun Cho [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4632538.svg)](https://doi.org/10.5281/zenodo.4632538) This repo contains: - The **NaturalProofs Dataset** - **Tokenized task data** for mathematical reference retrieval and generation. - **Preprocessing** NaturalProofs and the task data. - **Training** and **evaluation** for mathematical reference retrieval and generation. - **Pretrained models** for mathematical reference retrieval and generation. Please cite our work if you found the resources in this repository useful: ``` @inproceedings{welleck2021naturalproofs, title={NaturalProofs: Mathematical Theorem Proving in Natural Language}, author={Sean Welleck and Jiacheng Liu and Ronan Le Bras and Hannaneh Hajishirzi and Yejin Choi and Kyunghyun Cho}, booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1)}, year={2021}, url={https://openreview.net/forum?id=Jvxa8adr3iY} } ``` ## Quick download To download and unpack NaturalProofs, use: ``` pip install gdown python download.py --naturalproofs --savedir /path/to/savedir ``` To download and unpack all files that we describe below, use: ``` python download.py --naturalproofs --tokenized --checkpoint --other --savedir /path/to/savedir ``` This creates the following file structure: ``` {savedir}/data # contains NaturalProofs base data (.json files) and tokenized task data (.pkl files) {savedir}/ckpt # contains pretrained model checkpoints {savedir}/other # contains precomputed files for evaluation (ref encodings, etc.) ``` ## NaturalProofs Dataset We provide the NaturalProofs Dataset (JSON per domain): | NaturalProofs Dataset [[zenodo](https://doi.org/10.5281/zenodo.4632538)]| Domain| |-|-| |[naturalproofs_proofwiki.json](https://zenodo.org/record/4902289/files/naturalproofs_proofwiki.json?download=1)|ProofWiki| |[naturalproofs_stacks.json](https://zenodo.org/record/4902289/files/naturalproofs_stacks.json?download=1)|Stacks| |[naturalproofs_trench.json](https://zenodo.org/record/4902202/files/naturalproofs_trench.json?download=1)|Real Analysis textbook| |[naturalproofs_stein.json](https://zenodo.org/record/4902289/files/naturalproofs_stein.py?download=1) (script)|Number Theory textbook| To download NaturalProofs, use: ``` python download.py --naturalproofs --savedir /path/to/savedir ``` #### Combined ProofWiki+Stacks The download includes an extra combined ProofWiki+Stacks file made with [notebooks/merge.ipynb](notebooks/merge.ipynb). #### Preprocessing To see the steps used to create each domain of NaturalProofs from raw data, see the following notebooks.\ This preprocessing is **not needed** if you are using a preprocessed dataset provided above. | Domain| | |-|-| |ProofWiki|[notebook](notebooks/parse_proofwiki.ipynb)| |Stacks|[notebook](notebooks/parse_stacks.ipynb)| |Real Analysis textbook|[notebook](notebooks/parse_textbooks.ipynb)| |Number Theory textbook|[notebook](notebooks/parse_textbooks.ipynb)| ## Mathematical Reference Retrieval and Generation To use NaturalProofs for the reference retrieval and generation tasks described in the paper, the first step is tokenization. ### Tokenized dataset We tokenize the raw NaturalProofs Dataset into two different formats: - **Pairwise**: `(x, r)` - `x` theorem (sequence of tokens) - `r` reference (sequence of tokens) - This version is used to train and evaluate the **pairwise model**. - **Sequence**: `(x, [rid_1, ..., rid_Tx])` - `x` theorem (sequence of tokens) - `rid_i` reference id - This version is used to train and evaluate the **autoregressive** and **joint** models. We provide the following versions used in the paper (`bert-based-cased` tokenizer): | Type | Domain| Splits| |-|-|-| |Pairwise, `bert-base-cased`|Proofwiki | train,valid,test | ||Stacks| train,valid,test | ||Real Analysis (textbook))| test | ||Number Theory (textbook)| test | |Sequence, `bert-base-cased`|Proofwiki | train,valid,test | ||Stacks| train,valid,test | ||Real Analysis (textbook)| test | ||Number Theory (textbook)| test | To download and unpack them, use: ``` python download.py --tokenized --savedir /path/to/savedir ``` Or use [google drive link](https://drive.google.com/file/d/1OCIvcCyKTyRJeV7QiHdtQQhPJ6QknMpV/view?usp=sharing). ### Pretrained Models We provide the following models used in the paper: | Type | |Domain| |-|-|-| |**Pairwise**|`bert-base-cased`|Proofwiki | [link]()| |**Pairwise**|`bert-base-cased`|Stacks| [link]()| |**Pairwise**|`bert-base-cased`|Proofwiki+Stacks | [link]()| |**Joint**|`bert-base-cased`|Proofwiki | [link]()| |**Joint**|`bert-base-cased`|Stacks| [link]()| |**Joint**|`bert-base-cased`|Proofwiki+Stacks | [link]()| |**Autoregressive**|`bert-base-cased`|Proofwiki | |**Autoregressive**|`bert-base-cased`|Stacks| [link]()| To download and unpack them, use: ``` python download.py --checkpoint --savedir /path/to/savedir ``` Or use [google drive link](https://drive.google.com/file/d/1uIBeI7fw5vJBhDOl2WL3SbXWmzHgfK3W/view?usp=sharing). ### Creating your own tokenized dataset This step is **not needed** if you are using a tokenized dataset provided above.\ First, setup the code: ```bash python setup.py develop ``` To create your own tokenized versions: - **Pairwise**: `python naturalproofs/tokenize_pairwise.py` - **Sequence**: `python naturalproofs/encoder_decoder/utils.py` ## Evaluation We will show you how to run evaluation on the pretrained model checkpoints & associated files. ### Setup We will assume the file structure given by using the download script. ```bash python download.py --naturalproofs --tokenized --checkpoint --other --savedir ``` We provide a script which assembles an evaluation command for `(model type, domain, task)` combinations.\ We show example commands below. ```bash python run_analysis.py \ --train-ds-names {proofwiki stacks}+ \ # one or more training domains to choose a model --eval-ds-names {proofwiki stacks stein trench}+ \ # one or more evaluation domains --model {pairwise, joint, autoregressive} \ --generation \ # for generation task (autoregressive or joint models only) --split {valid, test} \ --gpu \ --codedir /path/to/naturalproofs_code \ --datadir /data \ --ckptdir /ckpt \ --outdir /output ``` To make sure your filepaths line up, please look inside `run_analysis.py` to see how the `--{}dir` arguments are used. #### Example: pairwise retrieval ``` python run_analysis.py --train-ds-names proofwiki \ --eval-ds-names proofwiki stein trench \ --model pairwise \ --gpu 1 \ --split test ``` #### Example: joint retrieval ``` python run_analysis.py --train-ds-names proofwiki \ --eval-ds-names proofwiki \ --model joint \ --gpu 1 \ --split test ``` #### Example: joint retrieval OOD For OOD evaluation on `stein` and `trench` textbooks, provide reference embeddings from the pairwise model.\ These are the `__encs.pt` files from running the pairwise retrieval evaluation (we provide an example in `other/`). ``` python run_analysis.py --model joint \ --train-ds-names proofwiki \ --eval-ds-names stein trench \ --stein-rencs /other/pairwise__train_proofwiki__eval_stein__test__encs.pt \ --trench-rencs /other/pairwise__train_proofwiki__eval_trench__test__encs.pt \ --gpu 1 \ --split test ``` #### Example: joint retrieval proofwiki+stacks model To align the model's combined output space with the individual dataset used for evaluation, give a `tok2tok.pkl` map (we provide an example in `other/`): ``` python run_analysis.py --model joint \ --train-ds-names both \ --eval-ds-names proofwiki stacks \ --modeltok2datatok /other/tok2tok.pkl \ --gpu 1 \ --split test ``` Note that OOD evaluation (`stein` or `trench`) is not implemented for the combined model. #### Example: autoregressive retrieval Without the `--generation` flag, adjusts settings for retrieval evaluation: ``` python run_analysis.py --model autoregressive \ --train-ds-names proofwiki \ --eval-ds-names proofwiki \ --gpu 1 \ --split valid ``` Note that OOD evaluation (`stein` or `trench`) is not implemented for the autoregressive model. #### Example: autoregressive generation ``` python run_analysis.py --model autoregressive --generation \ --train-ds-names proofwiki \ --eval-ds-names proofwiki \ --gpu 1 \ --split valid ``` ## Training The provided code supports: - Training a **pairwise** model - Training an **autoregressive** or **joint** model, initialized with pairwise model components (parameters, reference embeddings) #### Training a pairwise model ```bash python naturalproofs/model.py --expr-name pairwise \ --datapath /path/to/_tokenized__bert-base-cased.pkl \ --default-root-dir /path/to/output ``` #### Training a joint model The joint (and autoregressive) model uses a pairwise checkpoint, and reference encodings for initialization. - The pairwise checkpoint is saved during pairwise training. - The reference encodings are saved in a `encs.pt` file during pairwise Evaluation. ```bash python naturalproofs/encoder_decoder/model.py \ --datapath /path/to/sequence__tokenized__bert-base-cased.pkl \ --default-root-dir /path/to/output --pretrained-retrieval-checkpoint /path/to/pairwise__.ckpt \ --encs-file /path/to/train___eval___valid__encs.pt \ # obtained from running evaluation on trained pairwise model --parallel 1 \ --set-mode 1 # discard duplicates ``` Our implementation uses the same encoder-decoder architecture for the autoregressive and joint models, considering the joint model as a one-step special case (with KL-div loss). See the Appendix for a discussion on this design decision and technical details. #### Training an autoregressive model ```bash python naturalproofs/encoder_decoder/model.py \ --datapath /path/to/sequence__tokenized__bert-base-cased.pkl \ --default-root-dir /path/to/output --pretrained-retrieval-checkpoint /path/to/pairwise__.ckpt \ --encs-file /path/to/train___eval___valid__encs.pt \ # obtained from running evaluation on trained pairwise model --parallel 0 \ --set-mode 0 # keep duplicates ``` ### Non-neural baselines TF-IDF example: ```bash python naturalproofs/baselines.py \ --method tfidf \ --datapath /path/to/_tokenized__bert-base-cased_200.pkl \ --datapath-base /path/to/naturalproofs_.json \ --savedir /path/to/out/ ==> /path/to/out/tfidf__eval.pkl ``` Then use `analyze.py` to compute metrics: ```bash python naturalproofs/analyze.py \ --method tfidf \ --eval-path /path/to/out/tfidf__eval.pkl \ --datapath-base /path/to/naturalproofs_.json ==> /path/to/out/tfidf__analysis.pkl ```