Issue with Tutorial2_Finetune_a_model_on_your_data #199

anirbansaha96 · 2020-07-07T09:12:56Z

I am trying to fine-tune a model with my custom data set in SQuAD format. I'm facing the following error:

IndexError: Cannot choose from an empty sequence


The above exception was the direct cause of the following exception:


IndexError                                Traceback (most recent call last)
<ipython-input-6-667d6da5a3ea> in <module>()
      1 #train_data = "data/squad20"
      2 train_data = "/content/"
----> 3 reader.train(data_dir=train_data, train_filename="COI_json.json", use_gpu=False, n_epochs=1, save_dir="my_model")

4 frames
/usr/lib/python3.6/multiprocessing/pool.py in next(self, timeout)
    733         if success:
    734             return value
--> 735         raise value
    736 
    737     __next__ = next                    # XXX

IndexError: Cannot choose from an empty sequence

It occurred when I was running the following query:

train_data = "/content/" 
reader.train(data_dir=train_data, train_filename="COI_json.json", use_gpu=False, n_epochs=1, save_dir="my_model")

To get some insight into the data "COI_json.json", please give a look here.

The text was updated successfully, but these errors were encountered:

Timoeller · 2020-07-07T09:17:29Z

Hey @anirbansaha96 it seems to me you do not have questions inside your data, have a look at the end of the snippet you posted. There it says:

"qas": []

So qas is an empty list and it "Cannot choose from an empty sequence".

anirbansaha96 · 2020-07-07T09:29:47Z

What I wanted was an implementation similar to cdQA suite available here.

Specifically something similar to this:

cdqa_pipeline = QAPipeline(reader='bert_qa.joblib') # use 'distilbert_qa.joblib' for DistilBERT instead of BERT
cdqa_pipeline.fit_reader('path-to-custom-squad-like-dataset.json')
cdqa_pipeline.dump_reader('path-to-save-bert-reader.joblib')

Where the model can perhaps learn by Masked Language Modelling to pick-up the domain-specific semantics. Is there something that can be achieved towards this end?

Timoeller self-assigned this Jul 7, 2020

Timoeller added topic:file_converter question labels Jul 7, 2020

anirbansaha96 mentioned this issue Jul 7, 2020

Fine-tuning the Reader on domain data #192

Closed

anirbansaha96 closed this as completed Jul 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with Tutorial2_Finetune_a_model_on_your_data #199

Issue with Tutorial2_Finetune_a_model_on_your_data #199

anirbansaha96 commented Jul 7, 2020

Timoeller commented Jul 7, 2020

anirbansaha96 commented Jul 7, 2020

Issue with Tutorial2_Finetune_a_model_on_your_data #199

Issue with Tutorial2_Finetune_a_model_on_your_data #199

Comments

anirbansaha96 commented Jul 7, 2020

Timoeller commented Jul 7, 2020

anirbansaha96 commented Jul 7, 2020