Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Tutorial2_Finetune_a_model_on_your_data #199

Closed
anirbansaha96 opened this issue Jul 7, 2020 · 2 comments
Closed

Issue with Tutorial2_Finetune_a_model_on_your_data #199

anirbansaha96 opened this issue Jul 7, 2020 · 2 comments
Assignees

Comments

@anirbansaha96
Copy link
Contributor

I am trying to fine-tune a model with my custom data set in SQuAD format. I'm facing the following error:

IndexError: Cannot choose from an empty sequence


The above exception was the direct cause of the following exception:


IndexError                                Traceback (most recent call last)
<ipython-input-6-667d6da5a3ea> in <module>()
      1 #train_data = "data/squad20"
      2 train_data = "/content/"
----> 3 reader.train(data_dir=train_data, train_filename="COI_json.json", use_gpu=False, n_epochs=1, save_dir="my_model")

4 frames
/usr/lib/python3.6/multiprocessing/pool.py in next(self, timeout)
    733         if success:
    734             return value
--> 735         raise value
    736 
    737     __next__ = next                    # XXX

IndexError: Cannot choose from an empty sequence

It occurred when I was running the following query:

train_data = "/content/" 
reader.train(data_dir=train_data, train_filename="COI_json.json", use_gpu=False, n_epochs=1, save_dir="my_model")

To get some insight into the data "COI_json.json", please give a look here.

@Timoeller
Copy link
Contributor

Hey @anirbansaha96 it seems to me you do not have questions inside your data, have a look at the end of the snippet you posted. There it says:

"qas": []

So qas is an empty list and it "Cannot choose from an empty sequence".

@anirbansaha96
Copy link
Contributor Author

What I wanted was an implementation similar to cdQA suite available here.

Specifically something similar to this:

cdqa_pipeline = QAPipeline(reader='bert_qa.joblib') # use 'distilbert_qa.joblib' for DistilBERT instead of BERT
cdqa_pipeline.fit_reader('path-to-custom-squad-like-dataset.json')
cdqa_pipeline.dump_reader('path-to-save-bert-reader.joblib')

Where the model can perhaps learn by Masked Language Modelling to pick-up the domain-specific semantics. Is there something that can be achieved towards this end?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants