Name		Name	Last commit message	Last commit date
parent directory ..
Dockerfile		Dockerfile
README.md		README.md
best_models_metrics.json		best_models_metrics.json
demo.py		demo.py
docker-compose.yaml		docker-compose.yaml
final_dataset_english.tsv		final_dataset_english.tsv
hyperparameter_training.py		hyperparameter_training.py
hyperparameter_training_deberta.py		hyperparameter_training_deberta.py
hyperparameter_training_roberta.py		hyperparameter_training_roberta.py
main.py		main.py
paper_hyperparameter_training.py		paper_hyperparameter_training.py
paper_training.py		paper_training.py
requirements.txt		requirements.txt
test.csv		test.csv
test_main.http		test_main.http
test_paper.csv		test_paper.csv
train.csv		train.csv
train_paper.csv		train_paper.csv
training.py		training.py
tune_llama2_classifier.py		tune_llama2_classifier.py
validate.csv		validate.csv
validate_paper.csv		validate_paper.csv

README.md

Comparative Question Answering Project

Comparative Question Identification Model

-

Due to size problems, the model can't be found in this repository but it can be easily recreated, as explained below

The Comparative Question Identification Model is a transformer-based classification model created for the identification of comparative questions.

Dataset

It uses a mix of different datasets, the full dataset can be found in final_dataset_english.tsv. The different train, test and validate sets were created from this file, if they are missing they will be created by the scripts. As an overview, the dataset has 9,876 entries, equally divided between comparative and non-comparative.

Metrics

Multiple pre-trained models were tested. A full list of the models along with the best metrics obtained during the hyperparameter training can be found in the file best_models_metrics.json. The following metrics belong to the model distilbert-base-uncased fine-tuned on SST found in Hugging Face 🤗.

accuracy: 0.9696
f1: 0.9708
loss: 0.1257
precision: 0.9615
recall: 0.9803

Training

Before attempting training make sure you have installed all the requirements in requirements.txt. If you don't want to report to WandB please comment this out in the Training arguments. If you have CUDA please indicate this in the code to speed up the training. The model to be trained is set to distilbert-base-uncased fine-tuned on SST.

After these considerations, the model can be trained simply by running this:

python training.py

Demo and API

The demo is easily run once the model is created. It uses Gradio so it can be operated from your Explorer of preference. With the model once created you can start up the demo by executing the following:

python demo.py

An API was created to access the model through a request. It is in the main file main.py.

python main.py

The API is based on FastAPI and once run it requires a GET call to the following endpoint:

http:https://127.0.0.1:8000/is_comparative/Hello_World

With input or question at the end. It will return a positive or negative answer as JSON.

Hyperparameter Training

If you wish to test a new pre-trained model you may do so by modifying the file hyperparameter_training.py with your desired options and then you can run with the following:

python hyperparameter_training.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CQI

CQI

README.md

Comparative Question Answering Project

Comparative Question Identification Model

Dataset

Metrics

Training

Demo and API

Hyperparameter Training

Files

CQI

Directory actions

More options

Directory actions

More options

Latest commit

History

CQI

Folders and files

parent directory

README.md

Comparative Question Answering Project

Comparative Question Identification Model

Dataset

Metrics

Training

Demo and API

Hyperparameter Training