Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR - farm.data_handler.processor - Error message: Answer using start/end indices is 'x' while gold label text is 'ax' #492

Closed
sbhttchryy opened this issue Oct 15, 2020 · 4 comments
Assignees

Comments

@sbhttchryy
Copy link

sbhttchryy commented Oct 15, 2020

Hi, I am facing this error. The case is always that the answer start and end indices point at a part of the gold label text and not at the whole text itself. Could you please help me with this? The log is:

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/data/home/krystian/test_env/lib/python3.7/site-packages/farm/data_handler/data_silo.py", line 124, in _dataset_from_chunk
dataset = processor.dataset_from_dicts(dicts=dicts, indices=indices)
File "/data/home/krystian/test_env/lib/python3.7/site-packages/farm/data_handler/processor.py", line 1144, in dataset_from_dicts
dataset, tensor_names = self._create_dataset(keep_baskets=False)
File "/data/home/krystian/test_env/lib/python3.7/site-packages/farm/data_handler/processor.py", line 308, in _create_dataset
features_flat.extend(sample.features)
TypeError: 'NoneType' object is not iterable
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/data/home/krystian/haystack/haystack/src/models/finetuned_models/QA_MC_BERT.py", line 8, in
reader.train(data_dir=train_data, train_filename="Final_session_1_2_no_null_1.json", use_gpu=True, n_epochs=1, save_dir="test_model_no_null_entries")
File "/data/home/krystian/haystack/haystack/reader/farm.py", line 198, in train
data_silo = DataSilo(processor=processor, batch_size=batch_size, distributed=False, max_processes=num_processes)
File "/data/home/krystian/test_env/lib/python3.7/site-packages/farm/data_handler/data_silo.py", line 105, in init
self._load_data()
File "/data/home/krystian/test_env/lib/python3.7/site-packages/farm/data_handler/data_silo.py", line 207, in _load_data
self.data["train"], self.tensor_names = self._get_dataset(train_file)
File "/data/home/krystian/test_env/lib/python3.7/site-packages/farm/data_handler/data_silo.py", line 176, in _get_dataset
for dataset, tensor_names in results:
File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 748, in next
raise value
TypeError: 'NoneType' object is not iterable
Thank you.

@sbhttchryy
Copy link
Author

Dear developers, any clue for this problem? I am trying my fine-tuning on a German QA pair of a particular domain using the
"mrm8488/bert-multi-cased-finetuned-xquadv1".

@tholor
Copy link
Member

tholor commented Oct 16, 2020

Hey @sbhttchryy,
Are you sure you don't have any null entries in your JSON file? This can cause trouble as you already reported here #488 .

Can you maybe share a small script + toy dataset so that we can reproduce this on our side?

@Timoeller
Copy link
Contributor

Hey @sbhttchryy I presume you created the dataset with our annotation tool https://annotate.deepset.ai/index.html?

There we have some issue related to offsets, e.g. #403. We tried to fix these issues, but due to different OS related settings it is quite hard to catch all of them. So if you could supply minimal examples where the offset fails would be very helpful.

In the meantime you can use this gist https://gist.github.com/Timoeller/be6dfd8e34cdcd84fdca4c4aa72f42fc to correct the labels you produced.

Hope this helps, if not we will find other solutions to make your dataset work in haystack.

@Timoeller
Copy link
Contributor

Seems fixed, closing now, feel free to reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants