ERROR - farm.data_handler.processor - Error message: Answer using start/end indices is 'x' while gold label text is 'ax' #492

sbhttchryy · 2020-10-15T14:35:36Z

Hi, I am facing this error. The case is always that the answer start and end indices point at a part of the gold label text and not at the whole text itself. Could you please help me with this? The log is:

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/data/home/krystian/test_env/lib/python3.7/site-packages/farm/data_handler/data_silo.py", line 124, in _dataset_from_chunk
dataset = processor.dataset_from_dicts(dicts=dicts, indices=indices)
File "/data/home/krystian/test_env/lib/python3.7/site-packages/farm/data_handler/processor.py", line 1144, in dataset_from_dicts
dataset, tensor_names = self._create_dataset(keep_baskets=False)
File "/data/home/krystian/test_env/lib/python3.7/site-packages/farm/data_handler/processor.py", line 308, in _create_dataset
features_flat.extend(sample.features)
TypeError: 'NoneType' object is not iterable
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/data/home/krystian/haystack/haystack/src/models/finetuned_models/QA_MC_BERT.py", line 8, in
reader.train(data_dir=train_data, train_filename="Final_session_1_2_no_null_1.json", use_gpu=True, n_epochs=1, save_dir="test_model_no_null_entries")
File "/data/home/krystian/haystack/haystack/reader/farm.py", line 198, in train
data_silo = DataSilo(processor=processor, batch_size=batch_size, distributed=False, max_processes=num_processes)
File "/data/home/krystian/test_env/lib/python3.7/site-packages/farm/data_handler/data_silo.py", line 105, in init
self._load_data()
File "/data/home/krystian/test_env/lib/python3.7/site-packages/farm/data_handler/data_silo.py", line 207, in _load_data
self.data["train"], self.tensor_names = self._get_dataset(train_file)
File "/data/home/krystian/test_env/lib/python3.7/site-packages/farm/data_handler/data_silo.py", line 176, in _get_dataset
for dataset, tensor_names in results:
File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 748, in next
raise value
TypeError: 'NoneType' object is not iterable
Thank you.

sbhttchryy · 2020-10-16T16:13:40Z

Dear developers, any clue for this problem? I am trying my fine-tuning on a German QA pair of a particular domain using the
"mrm8488/bert-multi-cased-finetuned-xquadv1".

tholor · 2020-10-16T16:59:48Z

Hey @sbhttchryy,
Are you sure you don't have any null entries in your JSON file? This can cause trouble as you already reported here #488 .

Can you maybe share a small script + toy dataset so that we can reproduce this on our side?

Timoeller · 2020-10-19T09:19:11Z

Hey @sbhttchryy I presume you created the dataset with our annotation tool https://annotate.deepset.ai/index.html?

There we have some issue related to offsets, e.g. #403. We tried to fix these issues, but due to different OS related settings it is quite hard to catch all of them. So if you could supply minimal examples where the offset fails would be very helpful.

In the meantime you can use this gist https://gist.github.com/Timoeller/be6dfd8e34cdcd84fdca4c4aa72f42fc to correct the labels you produced.

Hope this helps, if not we will find other solutions to make your dataset work in haystack.

Timoeller · 2020-11-16T10:27:50Z

Seems fixed, closing now, feel free to reopen.

sbhttchryy added the question label Oct 15, 2020

tholor assigned Timoeller Oct 19, 2020

This was referenced Oct 19, 2020

Error on finetuning with empty entries in the annotated json file #488

Closed

Could not convert this sample to features deepset-ai/FARM#596

Closed

Timoeller closed this as completed Nov 16, 2020

Timoeller mentioned this issue Feb 26, 2021

Loosing Input Data in Classification Task deepset-ai/FARM#699

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ERROR - farm.data_handler.processor - Error message: Answer using start/end indices is 'x' while gold label text is 'ax' #492

ERROR - farm.data_handler.processor - Error message: Answer using start/end indices is 'x' while gold label text is 'ax' #492

sbhttchryy commented Oct 15, 2020 •

edited

Loading

sbhttchryy commented Oct 16, 2020

tholor commented Oct 16, 2020

Timoeller commented Oct 19, 2020

Timoeller commented Nov 16, 2020

ERROR - farm.data_handler.processor - Error message: Answer using start/end indices is 'x' while gold label text is 'ax' #492

ERROR - farm.data_handler.processor - Error message: Answer using start/end indices is 'x' while gold label text is 'ax' #492

Comments

sbhttchryy commented Oct 15, 2020 • edited Loading

sbhttchryy commented Oct 16, 2020

tholor commented Oct 16, 2020

Timoeller commented Oct 19, 2020

Timoeller commented Nov 16, 2020

sbhttchryy commented Oct 15, 2020 •

edited

Loading