Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Generative QA Models like RAG #443

Closed
tholor opened this issue Sep 29, 2020 · 12 comments
Closed

Add Generative QA Models like RAG #443

tholor opened this issue Sep 29, 2020 · 12 comments
Labels
type:feature New feature or request

Comments

@tholor
Copy link
Member

tholor commented Sep 29, 2020

What?
So far most generative QA models were not really useful in practice, because they could only answer very generic questions that were included in their "wiki + web" training corpora. For use cases in the industry we mostly want to:

  • ask domain-specific questions
  • get the answer from a specified, reliable corpus
  • have a possibility to check the answer (e.g. by inspecting the document / context where it came from)

Pure generative models don't fulfill these requirements. However, recent retrieval-augmented approaches could be interesting to test.

How?

The latest transformers release comes with the RAG model from Facebook (https://ai.facebook.com/blog/retrieval-augmented-generation-streamlining-the-creation-of-intelligent-natural-language-processing-models). We could add a "TransformersGenerator" class in Haystack that gets documents from the retriever and generates the answer conditioned on those.

@tholor tholor added the type:feature New feature or request label Sep 29, 2020
@Weilin37
Copy link

This would be really cool for domain-specific literature review

@nsankar
Copy link

nsankar commented Sep 30, 2020

@tholor As it is now in RAG, Is it correct to pass the haystack retriever output as inputs to RAG as cited in the snippet below in the RAG reference code given here ?

input_dict = tokenizer.prepare_seq2seq_batch(pass formatted haystack retriever outputs here, return_tensors="pt")
input_ids = input_dict["input_ids"]
outputs = model(input_ids=input_ids, labels=input_dict["labels"])

@tholor
Copy link
Member Author

tholor commented Oct 5, 2020

@nsankar I didn't have time yet to look into this, but I believe something like this should be possible with the RagTokenForGeneration class. Your "haystack retriever outputs" would be pretty much standard strings that we can pass there. Did you try it already?

@nsankar
Copy link

nsankar commented Oct 6, 2020

@tholor Yet to try this. I shall feedback once I try. Thank you for the inputs.

@Weilin37
Copy link

I'm here to cheerlead you guys on! Can't wait for this

@tholor
Copy link
Member Author

tholor commented Oct 23, 2020

Thanks to @lalitpagaria RAG integration in #484 is almost done!
We just need to do a FARM release before we can merge. The current plan is to do the release next week.

If you are very eager to try, you could already test it on the branch itself. There's already a small tutorial notebook!
Only be aware to run pip install -e . once to install the latest FARM version from master which is required for RAG.

@Weilin37
Copy link

@tholor I am definitely eager to try! Where is the tutorial notebook? I'll try it on the branch

@tholor
Copy link
Member Author

tholor commented Oct 23, 2020

@lalitpagaria
Copy link
Contributor

lalitpagaria commented Oct 23, 2020

@Weilin37 In this notebook you need to use different haystack version.
update -

!pip install git+https://github.com/deepset-ai/haystack/
!pip install urllib3==1.25.4

to -

!pip install git+https://github.com/lalitpagaria/haystack.git@implement_RAG
!pip install urllib3==1.25.4

@Weilin37
Copy link

Thank you!

I am a "No module named 'haystack.generator'".
I guess this is the pip install -e . thing? However I seem to get an error when I run that command. Something about setup.py not found and directory cannot be installed in editable mode.

@lalitpagaria
Copy link
Contributor

Please try this script https://github.com/lalitpagaria/haystack/blob/implement_RAG/tutorials/Tutorial7_RAG_Generator.py

Install pip install git+https://github.com/lalitpagaria/haystack.git@implement_RAG in separate virtualenv.

@tholor
Copy link
Member Author

tholor commented Nov 2, 2020

Fixed by #484

@tholor tholor closed this as completed Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:feature New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants