This repository contains the implementations of the system described in the paper "A Simple Approach to Jointly Rank Passages and Select Relevant Sentences in the OBQA Context"
The motivation of this work is to build a joint model For QA such that it can do both passage ranking and relevant sentence classification. Compared to systems which rely on multiple models to do each task, our joint model has less parameters while achieving reasonable performance. Please refer to the paper for details.
The code work with Python 3.7. If you use conda, you can set up the environment as follows:
conda create -n env_name python==3.7
conda activate env_name
Also, install the dependencies specified in the requirements.txt:
pip install -r requirements.txt
You can download the preprocess data of HotpotQA with the following links: data. The prepocessed data has concatenated the question and passage as input and injected special token </s>
before each sentence in a passage.
Download the data into dataset/
folder. We provide two toy examples under the dataset/toy.jsonl
.
We provide a function under process_data/process_hotpotqa.py
to preprocess the raw HotpotQA data. You can use it to convert the data to the jsonl format for experiments.
You can train a model with the following command, we use RoBERTa model, you can use any other encoder-based model, like BERT and ELECTRA.
python model_train/train_para.py \
--model_name_or_path roberta-base \
--tokenizer_name roberta-base \
--task_name para \
--data_dir dataset/para_training/ \
--max_seq_length 512 \
--output_dir path_to_save_model \
--do_train \
--do_eval \
--overwrite_output_dir \
--per_device_train_batch_size 4 \
--gradient_accumulation_steps 2 \
--per_device_eval_batch_size 32 \
--learning_rate 1e-5 \
--num_train_epochs 5 \
--save_steps 10000 \
--logging_steps 10000 \
--use_sent_loss \
@article{Luo2021ASA,
title={A Simple Approach to Jointly Rank Passages and Select Relevant Sentences in the OBQA Context},
author={Man Luo and Shuguang Chen and Chitta Baral},
journal={ArXiv},
year={2021},
volume={abs/2109.10497}
}
Feel free to get in touch via email to [email protected].