Repository for our TACL 2022 paper "MuSiQue: Multi-hop Questions via Single-hop Question Composition"
MuSiQue is distributed under a CC BY 4.0 License.
Usage Caution: If you're using any of our seed single-hop datasets (SQuAD, T-REx, Natural Questions, MLQA, Zero Shot RE) in any way (e.g., pretraining on them), please note that MuSiQue was created by composing questions from these seed datasets. Therefore, single-hop questions used in MuSiQue's dev/test sets may occur in the training sets of these seed datasets. To help avoid information leakage, we are releasing the IDs of single-hop questions that are used in MuSiQue dev/test sets. Once you download the data below, these IDs and corresponding questions will be in data/dev_test_singlehop_questions_v1.0.json
. If you use our seed single-hop datasets in any way in your model, please be sure to avoid using any single-hop question IDs present in this file
To download MuSiQue, either run the following script or download it manually from here.
bash download_data.sh
The result will be stored in data/
directory. It contains (i) train, dev and test sets of MuSiQue-Ans
and MuSiQue-Full
, (ii) single-hop questions and ids from source datasets (squad, natural questions, trex, mlqa, zerore) that are part of dev or test of MuSiQue.
We're releasing the model predictions (in official format) for 4 models on dev sets of MuSiQue-Ans
and MuSiQue-Full
. To get it, you can run the following script or download it manually from here.
bash download_predictions.sh
You can use evaluate_v1.0.py
to evaluate your predictions against ground-truths. For eg.:
python evaluate_v1.0.py predictions/musique_ans_v1.0_dev_end2end_model_predictions.jsonl data/musique_ans_v1.0_dev.jsonl
These are the results you would get for MuSiQue-Answerable and MuSiQue-Full validation sets and for each of the four models (End2End Model, Select+Answer Model, Execution by End2End Model, Execution by Select+Answer Model).
# MuSiQue-Answerable
python evaluate_v1.0.py predictions/musique_ans_v1.0_dev_end2end_model_predictions.jsonl data/musique_ans_v1.0_dev.jsonl
# => {"answer_f1": 0.423, "support_f1": 0.676}
python evaluate_v1.0.py predictions/musique_ans_v1.0_dev_select_answer_model_predictions.jsonl data/musique_ans_v1.0_dev.jsonl
# => {"answer_f1": 0.473, "support_f1": 0.723}
python evaluate_v1.0.py predictions/musique_ans_v1.0_dev_step_execution_by_end2end_model_predictions.jsonl data/musique_ans_v1.0_dev.jsonl
# => {"answer_f1": 0.456, "support_f1": 0.778}
python evaluate_v1.0.py predictions/musique_ans_v1.0_dev_step_execution_by_select_answer_model_predictions.jsonl data/musique_ans_v1.0_dev.jsonl
# => {"answer_f1": 0.497, "support_f1": 0.792}
# MuSiQue-Full
python evaluate_v1.0.py predictions/musique_full_v1.0_dev_end2end_model_predictions.jsonl data/musique_full_v1.0_dev.jsonl
# => {"answer_f1": 0.406, "support_f1": 0.325, "group_answer_sufficiency_f1": 0.22, "group_support_sufficiency_f1": 0.252}
python evaluate_v1.0.py predictions/musique_full_v1.0_dev_select_answer_model_predictions.jsonl data/musique_full_v1.0_dev.jsonl
# => {"answer_f1": 0.486, "support_f1": 0.522, "group_answer_sufficiency_f1": 0.344, "group_support_sufficiency_f1": 0.42}
python evaluate_v1.0.py predictions/musique_full_v1.0_dev_step_execution_by_end2end_model_predictions.jsonl data/musique_full_v1.0_dev.jsonl
# => {"answer_f1": 0.463, "support_f1": 0.75, "group_answer_sufficiency_f1": 0.321, "group_support_sufficiency_f1": 0.447}
python evaluate_v1.0.py predictions/musique_full_v1.0_dev_step_execution_by_select_answer_model_predictions.jsonl data/musique_full_v1.0_dev.jsonl
# => {"answer_f1": 0.498, "support_f1": 0.777, "group_answer_sufficiency_f1": 0.328, "group_support_sufficiency_f1": 0.431}
We've two leaderboards for MuSiQue: MuSiQue-Answerable and MuSiQue-Full.
Once you've the test set predictions in the official format, it's just about uploading the files in the above leadboards! Feel free to contact me (Harsh) in case you've any questions.
We've relased the code that we used for experiments in the paper. If you're interested in trying our trained models, training them from sratch, viewing their predictions or generating their predictions from your trained model, follow the steps below.
# Set env.
conda create -n musique python=3.8 -y && conda activate musique
# Set allennlp in root directory
git clone https://github.com/allenai/allennlp
cd allennlp
git checkout v2.1.0
git apply ../allennlp.diff # small diff to get longformer global attention to work correctly.
cd ..
pip install allennlp==2.1.0 # we only need dependencies of allennlp
pip uninstall -y allennlp
pip install gdown==v4.5.1
python -m nltk.downloader stopwords
pip uninstall -y transformers
pip install transformers==4.7.0 # we used this version of transformers
Our models were developed using a different (non-official) format of the dataset files. So to run our code, you'll first need to download the dataset files in the raw format.
python download_raw_data.py
Note that officially released data and what we've used here are only different in the format (e.g. uses different names for json fields), and are not qualitatively different. Take a look at raw_data_to_official_format.py
if you're interested.
We've done experiments on 4 datasets (MuSiQue-Ans, MuSiQue-Full, HotpotQA-20K, 2WikiMultihopQA-20K) with 4 multihop models (End2End Model, Select+Answer Model, Execution by End2End Model, Execution by Select+Answer Model) where possible. See Table 1. You can explore each combination using the instruction toggle below.
For each combination, you'll see instructions on how (i) download trained model (ii) train a model from scratch (iii) download model prediction/s (iv) generate predictions with a trained or a downloaded model.
Our models are implemented in allennlp
. If you're familiar with it, using the code should be pretty straightforward. The only difference is that instead of using allennlp
command, we're using run.py
as an entrypoint, which mainly loads allennlp_lib
to load our allennlp code (readers, models, predictors, etc).
End2End Model [EE]
end2end_model_for_musique_ans_dataset
python download_models.py end2end_model_for_musique_ans_dataset
python run.py train experiment_configs/end2end_model_for_musique_ans_dataset.jsonnet \
--serialization-dir serialization_dir/end2end_model_for_musique_ans_dataset
python download_raw_predictions.py end2end_model_for_musique_ans_dataset
python run.py predict serialization_dir/end2end_model_for_musique_ans_dataset/model.tar.gz \
raw_data/musique_ans_dev.jsonl \
--output-file serialization_dir/end2end_model_for_musique_ans_dataset/predictions/musique_ans_dev.jsonl \
--predictor transformer_rc --batch-size 16 --cuda-device 0 --silent
# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/end2end_model_for_musique_ans_dataset/predictions/musique_ans_dev.jsonl
Select+Answer Model [SA]
The system has 2 parts given below: (i) Selector Model (ii) Answerer Model
# Selector Model
select_and_answer_model_selector_for_musique_ans
python download_models.py select_and_answer_model_selector_for_musique_ans
python run.py train experiment_configs/select_and_answer_model_selector_for_musique_ans.jsonnet \
--serialization-dir serialization_dir/select_and_answer_model_selector_for_musique_ans
python download_raw_predictions.py select_and_answer_model_selector_for_musique_ans
python run.py predict serialization_dir/select_and_answer_model_selector_for_musique_ans/model.tar.gz \
raw_data/musique_ans_train.jsonl \
--output-file serialization_dir/select_and_answer_model_selector_for_musique_ans/predictions/musique_ans_train.jsonl \
--predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent
python run.py predict serialization_dir/select_and_answer_model_selector_for_musique_ans/model.tar.gz \
raw_data/musique_ans_dev.jsonl \
--output-file serialization_dir/select_and_answer_model_selector_for_musique_ans/predictions/musique_ans_dev.jsonl \
--predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent
# Answerer Model
select_and_answer_model_answerer_for_musique_ans
python download_models.py select_and_answer_model_answerer_for_musique_ans
python run.py train experiment_configs/select_and_answer_model_answerer_for_musique_ans.jsonnet \
--serialization-dir serialization_dir/select_and_answer_model_answerer_for_musique_ans
python download_raw_predictions.py select_and_answer_model_answerer_for_musique_ans
python run.py predict serialization_dir/select_and_answer_model_answerer_for_musique_ans/model.tar.gz \
serialization_dir/select_and_answer_model_selector_for_musique_ans/predictions/musique_ans_dev.jsonl \
--output-file serialization_dir/select_and_answer_model_answerer_for_musique_ans/predictions/serialization_dir__select_and_answer_model_selector_for_musique_ans__predictions__musique_ans_dev.jsonl \
--predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent
# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/select_and_answer_model_answerer_for_musique_ans/predictions/serialization_dir__select_and_answer_model_selector_for_musique_ans__predictions__musique_ans_dev.jsonl
Execution by End2End Model [EX(EE)]
The system has 2 parts given below: (i) Decomposer Model (ii) Executor Model.
# Decomposer Model
execution_model_decomposer_for_musique_ans_and_full
python download_models.py execution_model_decomposer_for_musique_ans_and_full
python run.py train experiment_configs/execution_model_decomposer_for_musique_ans_and_full.jsonnet \
--serialization-dir serialization_dir/execution_model_decomposer_for_musique_ans_and_full
python download_raw_predictions.py execution_model_decomposer_for_musique_ans_and_full
python run.py predict serialization_dir/execution_model_decomposer_for_musique_ans_and_full/model.tar.gz \
raw_data/musique_ans_dev.jsonl \
--output-file serialization_dir/execution_model_decomposer_for_musique_ans_and_full/predictions/musique_ans_dev.jsonl \
--predictor question_translator --batch-size 16 --cuda-device 0 --silent
# Executor Model
execution_by_end2end_model_for_musique_ans
python download_models.py execution_by_end2end_model_for_musique_ans
python run.py train experiment_configs/execution_by_end2end_model_for_musique_ans.jsonnet \
--serialization-dir serialization_dir/execution_by_end2end_model_for_musique_ans
python download_raw_predictions.py execution_by_end2end_model_for_musique_ans
python run.py predict serialization_dir/execution_by_end2end_model_for_musique_ans/model.tar.gz \
serialization_dir/execution_model_decomposer_for_musique_ans_and_full/predictions/musique_ans_dev.jsonl \
--output-file serialization_dir/execution_by_end2end_model_for_musique_ans/predictions/serialization_dir__execution_model_decomposer_for_musique_ans_and_full__predictions__musique_ans_dev.jsonl \
--predictor multi_step_end2end_transformer_rc --batch-size 16 --cuda-device 0 --silent \
--predictor-args '{"predict_answerability":false,"skip_distractor_paragraphs":false,"use_predicted_decomposition":true}'
# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/execution_by_end2end_model_for_musique_ans/predictions/serialization_dir__execution_model_decomposer_for_musique_ans_and_full__predictions__musique_ans_dev.jsonl
Execution by Select+Answer Model [EX(SA)]
The system has 3 parts given below: (i) Decomposer Model (ii) Selector of Executor Model (iii) Answerer of Executor Model.
# Decomposer Model
execution_model_decomposer_for_musique_ans_and_full
python download_models.py execution_model_decomposer_for_musique_ans_and_full
python run.py train experiment_configs/execution_model_decomposer_for_musique_ans_and_full.jsonnet \
--serialization-dir serialization_dir/execution_model_decomposer_for_musique_ans_and_full
python download_raw_predictions.py execution_model_decomposer_for_musique_ans_and_full
python run.py predict serialization_dir/execution_model_decomposer_for_musique_ans_and_full/model.tar.gz \
raw_data/musique_ans_dev.jsonl \
--output-file serialization_dir/execution_model_decomposer_for_musique_ans_and_full/predictions/musique_ans_dev.jsonl \
--predictor question_translator --batch-size 16 --cuda-device 0 --silent
# Selector of Executor Model
execution_by_select_and_answer_model_selector_for_musique_ans
python download_models.py execution_by_select_and_answer_model_selector_for_musique_ans
python run.py train experiment_configs/execution_by_select_and_answer_model_selector_for_musique_ans.jsonnet \
--serialization-dir serialization_dir/execution_by_select_and_answer_model_selector_for_musique_ans
python download_raw_predictions.py execution_by_select_and_answer_model_selector_for_musique_ans
python run.py predict serialization_dir/execution_by_select_and_answer_model_selector_for_musique_ans/model.tar.gz \
raw_data/musique_ans_single_hop_version_train.jsonl \
--output-file serialization_dir/execution_by_select_and_answer_model_selector_for_musique_ans/predictions/musique_ans_single_hop_version_train.jsonl \
--predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent
python run.py predict serialization_dir/execution_by_select_and_answer_model_selector_for_musique_ans/model.tar.gz \
raw_data/musique_ans_single_hop_version_dev.jsonl \
--output-file serialization_dir/execution_by_select_and_answer_model_selector_for_musique_ans/predictions/musique_ans_single_hop_version_dev.jsonl \
--predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent
# Answerer of Executor Model
execution_by_select_and_answer_model_answerer_for_musique_ans
python download_models.py execution_by_select_and_answer_model_answerer_for_musique_ans
python run.py train experiment_configs/execution_by_select_and_answer_model_answerer_for_musique_ans.jsonnet \
--serialization-dir serialization_dir/execution_by_select_and_answer_model_answerer_for_musique_ans
python download_raw_predictions.py execution_by_select_and_answer_model_answerer_for_musique_ans
python run.py predict serialization_dir/execution_by_select_and_answer_model_answerer_for_musique_ans/model.tar.gz \
serialization_dir/execution_model_decomposer_for_musique_ans_and_full/predictions/musique_ans_dev.jsonl \
--output-file serialization_dir/execution_by_select_and_answer_model_answerer_for_musique_ans/predictions/serialization_dir__execution_model_decomposer_for_musique_ans_and_full__predictions__musique_ans_dev.jsonl \
--predictor multi_step_select_and_answer_transformer_rc --batch-size 16 --cuda-device 0 --silent \
--predictor-args '{"predict_answerability":false,"skip_distractor_paragraphs":false,"use_predicted_decomposition":true,"selector_model_path":"serialization_dir/execution_by_select_and_answer_model_selector_for_musique_ans/model.tar.gz","num_select":3}'
# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/execution_by_select_and_answer_model_answerer_for_musique_ans/predictions/serialization_dir__execution_model_decomposer_for_musique_ans_and_full__predictions__musique_ans_dev.jsonl
End2End Model [EE]
end2end_model_for_musique_full_dataset
python download_models.py end2end_model_for_musique_full_dataset
python run.py train experiment_configs/end2end_model_for_musique_full_dataset.jsonnet \
--serialization-dir serialization_dir/end2end_model_for_musique_full_dataset
python download_raw_predictions.py end2end_model_for_musique_full_dataset
python run.py predict serialization_dir/end2end_model_for_musique_full_dataset/model.tar.gz \
raw_data/musique_full_dev.jsonl \
--output-file serialization_dir/end2end_model_for_musique_full_dataset/predictions/musique_full_dev.jsonl \
--predictor transformer_rc --batch-size 16 --cuda-device 0 --silent
# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/end2end_model_for_musique_full_dataset/predictions/musique_full_dev.jsonl
Select+Answer Model [SA]
The system has 2 parts given below: (i) Selector Model (ii) Answerer Model.
# Selector Model
select_and_answer_model_selector_for_musique_full
python download_models.py select_and_answer_model_selector_for_musique_full
python run.py train experiment_configs/select_and_answer_model_selector_for_musique_full.jsonnet \
--serialization-dir serialization_dir/select_and_answer_model_selector_for_musique_full
python download_raw_predictions.py select_and_answer_model_selector_for_musique_full
python run.py predict serialization_dir/select_and_answer_model_selector_for_musique_full/model.tar.gz \
raw_data/musique_full_train.jsonl \
--output-file serialization_dir/select_and_answer_model_selector_for_musique_full/predictions/musique_full_train.jsonl \
--predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent
python run.py predict serialization_dir/select_and_answer_model_selector_for_musique_full/model.tar.gz \
raw_data/musique_full_dev.jsonl \
--output-file serialization_dir/select_and_answer_model_selector_for_musique_full/predictions/musique_full_dev.jsonl \
--predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent
# Answerer Model
select_and_answer_model_answerer_for_musique_full
python download_models.py select_and_answer_model_answerer_for_musique_full
python run.py train experiment_configs/select_and_answer_model_answerer_for_musique_full.jsonnet \
--serialization-dir serialization_dir/select_and_answer_model_answerer_for_musique_full
python download_raw_predictions.py select_and_answer_model_answerer_for_musique_full
python run.py predict serialization_dir/select_and_answer_model_answerer_for_musique_full/model.tar.gz \
serialization_dir/select_and_answer_model_selector_for_musique_full/predictions/musique_full_dev.jsonl \
--output-file serialization_dir/select_and_answer_model_answerer_for_musique_full/predictions/serialization_dir__select_and_answer_model_selector_for_musique_full__predictions__musique_full_dev.jsonl \
--predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent
# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/select_and_answer_model_answerer_for_musique_full/predictions/serialization_dir__select_and_answer_model_selector_for_musique_full__predictions__musique_full_dev.jsonl
Execution by End2End Model [EX(EE)]
The system has 2 parts given below: (i) Decomposer Model (ii) Executor Model.
# Decomposer Model
execution_model_decomposer_for_musique_ans_and_full
python download_models.py execution_model_decomposer_for_musique_ans_and_full
python run.py train experiment_configs/execution_model_decomposer_for_musique_ans_and_full.jsonnet \
--serialization-dir serialization_dir/execution_model_decomposer_for_musique_ans_and_full
python download_raw_predictions.py execution_model_decomposer_for_musique_ans_and_full
python run.py predict serialization_dir/execution_model_decomposer_for_musique_ans_and_full/model.tar.gz \
raw_data/musique_full_dev.jsonl \
--output-file serialization_dir/execution_model_decomposer_for_musique_ans_and_full/predictions/musique_full_dev.jsonl \
--predictor question_translator --batch-size 16 --cuda-device 0 --silent
# Executor Model
execution_by_end2end_model_for_musique_full
python download_models.py execution_by_end2end_model_for_musique_full
python run.py train experiment_configs/execution_by_end2end_model_for_musique_full.jsonnet \
--serialization-dir serialization_dir/execution_by_end2end_model_for_musique_full
python download_raw_predictions.py execution_by_end2end_model_for_musique_full
python run.py predict serialization_dir/execution_by_end2end_model_for_musique_full/model.tar.gz \
serialization_dir/execution_model_decomposer_for_musique_ans_and_full/predictions/musique_full_dev.jsonl \
--output-file serialization_dir/execution_by_end2end_model_for_musique_full/predictions/serialization_dir__execution_model_decomposer_for_musique_ans_and_full__predictions__musique_full_dev.jsonl \
--predictor multi_step_end2end_transformer_rc --batch-size 16 --cuda-device 0 --silent \
--predictor-args '{"predict_answerability":true,"skip_distractor_paragraphs":false,"use_predicted_decomposition":true}'
# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/execution_by_end2end_model_for_musique_full/predictions/serialization_dir__execution_model_decomposer_for_musique_ans_and_full__predictions__musique_full_dev.jsonl
Execution by Select+Answer Model [EX(SA)]
The system has 3 parts given below: (i) Decomposer Model (ii) Selector of Executor Model (iii) Answerer of Executor Model.
# Decomposer Model
execution_model_decomposer_for_musique_ans_and_full
python download_models.py execution_model_decomposer_for_musique_ans_and_full
python run.py train experiment_configs/execution_model_decomposer_for_musique_ans_and_full.jsonnet \
--serialization-dir serialization_dir/execution_model_decomposer_for_musique_ans_and_full
python download_raw_predictions.py execution_model_decomposer_for_musique_ans_and_full
python run.py predict serialization_dir/execution_model_decomposer_for_musique_ans_and_full/model.tar.gz \
raw_data/musique_full_dev.jsonl \
--output-file serialization_dir/execution_model_decomposer_for_musique_ans_and_full/predictions/musique_full_dev.jsonl \
--predictor question_translator --batch-size 16 --cuda-device 0 --silent
# Selector of Executor Model
execution_by_select_and_answer_model_selector_for_musique_full
python download_models.py execution_by_select_and_answer_model_selector_for_musique_full
python run.py train experiment_configs/execution_by_select_and_answer_model_selector_for_musique_full.jsonnet \
--serialization-dir serialization_dir/execution_by_select_and_answer_model_selector_for_musique_full
python download_raw_predictions.py execution_by_select_and_answer_model_selector_for_musique_full
python run.py predict serialization_dir/execution_by_select_and_answer_model_selector_for_musique_full/model.tar.gz \
raw_data/musique_full_single_hop_version_train.jsonl \
--output-file serialization_dir/execution_by_select_and_answer_model_selector_for_musique_full/predictions/musique_full_single_hop_version_train.jsonl \
--predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent
python run.py predict serialization_dir/execution_by_select_and_answer_model_selector_for_musique_full/model.tar.gz \
raw_data/musique_full_single_hop_version_dev.jsonl \
--output-file serialization_dir/execution_by_select_and_answer_model_selector_for_musique_full/predictions/musique_full_single_hop_version_dev.jsonl \
--predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent
# Answerer of Executor Model
execution_by_select_and_answer_model_answerer_for_musique_full
python download_models.py execution_by_select_and_answer_model_answerer_for_musique_full
python run.py train experiment_configs/execution_by_select_and_answer_model_answerer_for_musique_full.jsonnet \
--serialization-dir serialization_dir/execution_by_select_and_answer_model_answerer_for_musique_full
python download_raw_predictions.py execution_by_select_and_answer_model_answerer_for_musique_full
python run.py predict serialization_dir/execution_by_select_and_answer_model_answerer_for_musique_full/model.tar.gz \
serialization_dir/execution_model_decomposer_for_musique_ans_and_full/predictions/musique_full_dev.jsonl \
--output-file serialization_dir/execution_by_select_and_answer_model_answerer_for_musique_full/predictions/serialization_dir__execution_model_decomposer_for_musique_ans_and_full__predictions__musique_full_dev.jsonl \
--predictor multi_step_select_and_answer_transformer_rc --batch-size 16 --cuda-device 0 --silent \
--predictor-args '{"predict_answerability":true,"skip_distractor_paragraphs":false,"use_predicted_decomposition":true,"selector_model_path":"serialization_dir/execution_by_select_and_answer_model_selector_for_musique_full/model.tar.gz","num_select":3}'
# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/execution_by_select_and_answer_model_answerer_for_musique_full/predictions/serialization_dir__execution_model_decomposer_for_musique_ans_and_full__predictions__musique_full_dev.jsonl
End2End Model [EE]
end2end_model_for_hotpotqa_20k_dataset
python download_models.py end2end_model_for_hotpotqa_20k_dataset
python run.py train experiment_configs/end2end_model_for_hotpotqa_20k_dataset.jsonnet \
--serialization-dir serialization_dir/end2end_model_for_hotpotqa_20k_dataset
python download_raw_predictions.py end2end_model_for_hotpotqa_20k_dataset
python run.py predict serialization_dir/end2end_model_for_hotpotqa_20k_dataset/model.tar.gz \
raw_data/hotpotqa_dev_20k.jsonl \
--output-file serialization_dir/end2end_model_for_hotpotqa_20k_dataset/predictions/hotpotqa_dev_20k.jsonl \
--predictor transformer_rc --batch-size 16 --cuda-device 0 --silent
# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/end2end_model_for_hotpotqa_20k_dataset/predictions/hotpotqa_dev_20k.jsonl
Select+Answer Model [SA]
The system has 2 parts given below: (i) Selector Model (ii) Answerer Model.
# Selector Model
select_and_answer_model_selector_for_hotpotqa_20k
python download_models.py select_and_answer_model_selector_for_hotpotqa_20k
python run.py train experiment_configs/select_and_answer_model_selector_for_hotpotqa_20k.jsonnet \
--serialization-dir serialization_dir/select_and_answer_model_selector_for_hotpotqa_20k
python download_raw_predictions.py select_and_answer_model_selector_for_hotpotqa_20k
python run.py predict serialization_dir/select_and_answer_model_selector_for_hotpotqa_20k/model.tar.gz \
raw_data/hotpotqa_train_20k.jsonl \
--output-file serialization_dir/select_and_answer_model_selector_for_hotpotqa_20k/predictions/hotpotqa_train_20k.jsonl \
--predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent
python run.py predict serialization_dir/select_and_answer_model_selector_for_hotpotqa_20k/model.tar.gz \
raw_data/hotpotqa_dev_20k.jsonl \
--output-file serialization_dir/select_and_answer_model_selector_for_hotpotqa_20k/predictions/hotpotqa_dev_20k.jsonl \
--predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent
# Answerer Model
select_and_answer_model_answerer_for_hotpotqa_20k
python download_models.py select_and_answer_model_answerer_for_hotpotqa_20k
python run.py train experiment_configs/select_and_answer_model_answerer_for_hotpotqa_20k.jsonnet \
--serialization-dir serialization_dir/select_and_answer_model_answerer_for_hotpotqa_20k
python download_raw_predictions.py select_and_answer_model_answerer_for_hotpotqa_20k
python run.py predict serialization_dir/select_and_answer_model_answerer_for_hotpotqa_20k/model.tar.gz \
serialization_dir/select_and_answer_model_selector_for_hotpotqa_20k/predictions/hotpotqa_dev_20k.jsonl \
--output-file serialization_dir/select_and_answer_model_answerer_for_hotpotqa_20k/predictions/serialization_dir__select_and_answer_model_selector_for_hotpotqa_20k__predictions__hotpotqa_dev_20k.jsonl \
--predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent
# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/select_and_answer_model_answerer_for_hotpotqa_20k/predictions/serialization_dir__select_and_answer_model_selector_for_hotpotqa_20k__predictions__hotpotqa_dev_20k.jsonl
End2End Model [EE]
end2end_model_for_2wikimultihopqa_20k_dataset
python download_models.py end2end_model_for_2wikimultihopqa_20k_dataset
python run.py train experiment_configs/end2end_model_for_2wikimultihopqa_20k_dataset.jsonnet \
--serialization-dir serialization_dir/end2end_model_for_2wikimultihopqa_20k_dataset
python download_raw_predictions.py end2end_model_for_2wikimultihopqa_20k_dataset
python run.py predict serialization_dir/end2end_model_for_2wikimultihopqa_20k_dataset/model.tar.gz \
raw_data/2wikimultihopqa_dev_20k.jsonl \
--output-file serialization_dir/end2end_model_for_2wikimultihopqa_20k_dataset/predictions/2wikimultihopqa_dev_20k.jsonl \
--predictor transformer_rc --batch-size 16 --cuda-device 0 --silent
# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/end2end_model_for_2wikimultihopqa_20k_dataset/predictions/2wikimultihopqa_dev_20k.jsonl
Select+Answer Model [SA]
The system has 2 parts given below: (i) Selector Model (ii) Answerer Model.
# Selector Model
select_and_answer_model_selector_for_2wikimultihopqa_20k_dataset
python download_models.py select_and_answer_model_selector_for_2wikimultihopqa_20k_dataset
python run.py train experiment_configs/select_and_answer_model_selector_for_2wikimultihopqa_20k_dataset.jsonnet \
--serialization-dir serialization_dir/select_and_answer_model_selector_for_2wikimultihopqa_20k_dataset
python download_raw_predictions.py select_and_answer_model_selector_for_2wikimultihopqa_20k_dataset
python run.py predict serialization_dir/select_and_answer_model_selector_for_2wikimultihopqa_20k_dataset/model.tar.gz \
raw_data/2wikimultihopqa_train_20k.jsonl \
--output-file serialization_dir/select_and_answer_model_selector_for_2wikimultihopqa_20k_dataset/predictions/2wikimultihopqa_train_20k.jsonl \
--predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent
python run.py predict serialization_dir/select_and_answer_model_selector_for_2wikimultihopqa_20k_dataset/model.tar.gz \
raw_data/2wikimultihopqa_dev_20k.jsonl \
--output-file serialization_dir/select_and_answer_model_selector_for_2wikimultihopqa_20k_dataset/predictions/2wikimultihopqa_dev_20k.jsonl \
--predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent
# Answerer Model
select_and_answer_model_answerer_for_2wikimultihopqa_20k_dataset
python download_models.py select_and_answer_model_answerer_for_2wikimultihopqa_20k_dataset
python run.py train experiment_configs/select_and_answer_model_answerer_for_2wikimultihopqa_20k_dataset.jsonnet \
--serialization-dir serialization_dir/select_and_answer_model_answerer_for_2wikimultihopqa_20k_dataset
python download_raw_predictions.py select_and_answer_model_answerer_for_2wikimultihopqa_20k_dataset
python run.py predict serialization_dir/select_and_answer_model_answerer_for_2wikimultihopqa_20k_dataset/model.tar.gz \
serialization_dir/select_and_answer_model_selector_for_2wikimultihopqa_20k_dataset/predictions/2wikimultihopqa_dev_20k.jsonl \
--output-file serialization_dir/select_and_answer_model_answerer_for_2wikimultihopqa_20k_dataset/predictions/serialization_dir__select_and_answer_model_selector_for_2wikimultihopqa_20k_dataset__predictions__2wikimultihopqa_dev_20k.jsonl \
--predictor transformer_rc --batch-size 16 --cuda-device 0 --silent
# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/select_and_answer_model_answerer_for_2wikimultihopqa_20k_dataset/predictions/serialization_dir__select_and_answer_model_selector_for_2wikimultihopqa_20k_dataset__predictions__2wikimultihopqa_dev_20k.jsonl
Execution by End2End Model [EX(EE)]
The system has 2 parts given below: (i) Decomposer Model (ii) Executor Model.
# Decomposer Model
execution_model_decomposer_for_2wikimultihopqa
python download_models.py execution_model_decomposer_for_2wikimultihopqa
python run.py train experiment_configs/execution_model_decomposer_for_2wikimultihopqa.jsonnet \
--serialization-dir serialization_dir/execution_model_decomposer_for_2wikimultihopqa
python download_raw_predictions.py execution_model_decomposer_for_2wikimultihopqa
python run.py predict serialization_dir/execution_model_decomposer_for_2wikimultihopqa/model.tar.gz \
raw_data/2wikimultihopqa_dev_20k.jsonl \
--output-file serialization_dir/execution_model_decomposer_for_2wikimultihopqa/predictions/2wikimultihopqa_dev_20k.jsonl \
--predictor question_translator --batch-size 16 --cuda-device 0 --silent
# Executor Model
execution_by_end2end_model_for_2wikimultihopqa
python download_models.py execution_by_end2end_model_for_2wikimultihopqa
python run.py train experiment_configs/execution_by_end2end_model_for_2wikimultihopqa.jsonnet \
--serialization-dir serialization_dir/execution_by_end2end_model_for_2wikimultihopqa
python download_raw_predictions.py execution_by_end2end_model_for_2wikimultihopqa
python run.py predict serialization_dir/execution_by_end2end_model_for_2wikimultihopqa/model.tar.gz \
serialization_dir/execution_model_decomposer_for_2wikimultihopqa/predictions/2wikimultihopqa_dev_20k.jsonl \
--output-file serialization_dir/execution_by_end2end_model_for_2wikimultihopqa/predictions/serialization_dir__execution_model_decomposer_for_2wikimultihopqa__predictions__2wikimultihopqa_dev_20k.jsonl \
--predictor multi_step_end2end_transformer_rc --batch-size 16 --cuda-device 0 --silent \
--predictor-args '{"predict_answerability":false,"skip_distractor_paragraphs":false,"use_predicted_decomposition":true}'
# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/execution_by_end2end_model_for_2wikimultihopqa/predictions/serialization_dir__execution_model_decomposer_for_2wikimultihopqa__predictions__2wikimultihopqa_dev_20k.jsonl
Execution by Select+Answer Model [EX(SA)]
The system has 3 parts given below: (i) Decomposer Model (ii) Selector of Executor Model (iii) Answerer of Executor Model.
# Decomposer Model
execution_model_decomposer_for_2wikimultihopqa
python download_models.py execution_model_decomposer_for_2wikimultihopqa
python run.py train experiment_configs/execution_model_decomposer_for_2wikimultihopqa.jsonnet \
--serialization-dir serialization_dir/execution_model_decomposer_for_2wikimultihopqa
python download_raw_predictions.py execution_model_decomposer_for_2wikimultihopqa
python run.py predict serialization_dir/execution_model_decomposer_for_2wikimultihopqa/model.tar.gz \
raw_data/2wikimultihopqa_dev_20k.jsonl \
--output-file serialization_dir/execution_model_decomposer_for_2wikimultihopqa/predictions/2wikimultihopqa_dev_20k.jsonl \
--predictor question_translator --batch-size 16 --cuda-device 0 --silent
# Selector of Executor Model
execution_by_select_and_answer_model_selector_for_2wikimultihopqa
python download_models.py execution_by_select_and_answer_model_selector_for_2wikimultihopqa
python run.py train experiment_configs/execution_by_select_and_answer_model_selector_for_2wikimultihopqa.jsonnet \
--serialization-dir serialization_dir/execution_by_select_and_answer_model_selector_for_2wikimultihopqa
python download_raw_predictions.py execution_by_select_and_answer_model_selector_for_2wikimultihopqa
python run.py predict serialization_dir/execution_by_select_and_answer_model_selector_for_2wikimultihopqa/model.tar.gz \
raw_data/2wikimultihopqa_single_hop_version_train_20k.jsonl \
--output-file serialization_dir/execution_by_select_and_answer_model_selector_for_2wikimultihopqa/predictions/2wikimultihopqa_single_hop_version_train_20k.jsonl \
--predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent
python run.py predict serialization_dir/execution_by_select_and_answer_model_selector_for_2wikimultihopqa/model.tar.gz \
raw_data/2wikimultihopqa_single_hop_version_dev.jsonl \
--output-file serialization_dir/execution_by_select_and_answer_model_selector_for_2wikimultihopqa/predictions/2wikimultihopqa_single_hop_version_dev.jsonl \
--predictor inplace_text_ranker --batch-size 16 --cuda-device 0 --silent
# Answerer of Executor Model
execution_by_select_and_answer_model_answerer_for_2wikimultihopqa
python download_models.py execution_by_select_and_answer_model_answerer_for_2wikimultihopqa
python run.py train experiment_configs/execution_by_select_and_answer_model_answerer_for_2wikimultihopqa.jsonnet \
--serialization-dir serialization_dir/execution_by_select_and_answer_model_answerer_for_2wikimultihopqa
python download_raw_predictions.py execution_by_select_and_answer_model_answerer_for_2wikimultihopqa
python run.py predict serialization_dir/execution_by_select_and_answer_model_answerer_for_2wikimultihopqa/model.tar.gz \
serialization_dir/execution_model_decomposer_for_2wikimultihopqa/predictions/2wikimultihopqa_dev_20k.jsonl \
--output-file serialization_dir/execution_by_select_and_answer_model_answerer_for_2wikimultihopqa/predictions/serialization_dir__execution_model_decomposer_for_2wikimultihopqa__predictions__2wikimultihopqa_dev_20k.jsonl \
--predictor multi_step_select_and_answer_transformer_rc --batch-size 16 --cuda-device 0 --silent \
--predictor-args '{"predict_answerability":false,"skip_distractor_paragraphs":false,"use_predicted_decomposition":true,"selector_model_path":"serialization_dir/execution_by_select_and_answer_model_selector_for_2wikimultihopqa/model.tar.gz","num_select":3}'
# If you want to convert predictions to the official format, run:
python raw_predictions_to_official_format.py serialization_dir/execution_by_select_and_answer_model_answerer_for_2wikimultihopqa/predictions/serialization_dir__execution_model_decomposer_for_2wikimultihopqa__predictions__2wikimultihopqa_dev_20k.jsonl
If you use this in your work, please cite use:
@article{trivedi2021musique,
title={{M}u{S}i{Q}ue: Multihop Questions via Single-hop Question Composition},
author={Trivedi, Harsh and Balasubramanian, Niranjan and Khot, Tushar and Sabharwal, Ashish},
journal={Transactions of the Association for Computational Linguistics},
year={2022}
publisher={MIT Press}
}