-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect generation of training data in GPL training #2751
Comments
I have a feeling that the pseudo_label_generator.py file in haystack might be having some issues while generating training data. |
Hi @aditya-malte we compared the results generated by Haystack's implementation of GPL to the results generated by the reference implementation and didn't find any differences. What models did you use for the question generator and the cross encoder? The ones used in the tutorial or did you change them? Did you make any changes to the tutorial, for example, did you use other data? Maybe @vblagoje can help here? |
@aditya-malte, thanks for your report. The questions Julian posted are what I would have asked. But maybe it would be simpler if you shared your notebook so we can take a look? |
Ping @aditya-malte , any updates? Have you noticed these issues in the GPL tutorial? Would love to hear back from you on this one. |
I am closing this issue due to a lack of response from the issuer. We'll reopen if we discover issues with a unit test or clear proof of an issue with GPL. |
Describe the bug
Hi,
I referred this tutorial for training my own GPL model.
On closer observation, I noticed two things:
Expected behavior
"pos" and "neg" to not be switched at some places
AND
labels to be more accurate
Additional context
Add any other context about the problem here, like document types / preprocessing steps / settings of reader etc.
To Reproduce
Steps to reproduce the behavior
FAQ Check
System:
The text was updated successfully, but these errors were encountered: