Skip to content

Latest commit

 

History

History
57 lines (38 loc) · 4.01 KB

retrieval-based-answer-generation.md

File metadata and controls

57 lines (38 loc) · 4.01 KB

Retrieval-based Answer Generation

Unlike Extractive systems, Generative/Abstractive Question Answering systems create answers to questions, based on some knowledge. The answers generated by abstractive systems can aggregate information contained in multiple original passages and are human-like.

Generative closed-book 📕 systems (ChatGPT-like)

Today everyone knows and talks about ChatGPT 💬. If we look at it from the point of view of Question Answering, it is a closed-book system: it is based on internal knowledge. This knowledge is also known as "parametric memory": it is stored in the model weights and is accumulated during training.

When used in isolation, abstractive closed-book QA systems have some serious limitations:

❌ their knowledge does not fit a specific domain and is expensive and difficult to update over time

❌ they can produce "hallucinations"

Retriever + Generator

To overcome the disadvantages of generative closed-book solutions, several systems have been developed that share a similar idea:

  • use a Retriever 🔎 to collect passages of text relevant to the user's question
  • use the non-parametric knowledge stored in text passages to influence Answer Generation

In recent years, various systems have been proposed that combine the two components. Some examples: ORQA (Google), REALM (Google), RAG (Meta), FiD (Meta), RETRO (Deepmind). Probably the most popular is Retrieval-Augmented Generation (RAG), proposed by Patrick Lewis et al. in 2020, which has made a comeback in recent months...

Fusion-in-Decoder (FiD)

Fusion-in-Decoder

This system is not that famous, but it is simple and effective. It was introduced by Gautier Izacard and Edouard Grave (Meta Research) in 2021.

  • for Retrieval from Wikipedia, the authors considered two methods: BM25 (sparse retrieval) and Dense Passage Retrieval. Since the retriever is not trained, FiD is potentially compatible with any retrieval system.

  • the generative model is based on a sequence-to-sequence network, pretrained on unsupervised data, such as T5 (transformer with encoder-decoder architecture). Each retrieved passage and its title are concatenated with the question and processed independently from other passages by the encoder. Then the decoder performs attention over the concatenation of the resulting representations of all the retrieved passages (Fusion-in-Decoder).

Experiments and results:

  • The FiD system has been trained and evaluated on 3 different QA datasets
  • while conceptually simple, trained models are competitive with or better than closed-book approaches and result in much smaller sizes
  • major performance improvements are achieved by using the knowledge retrieved and scaling to a large number of jointly processed passages

A lesson to take home

  • (Large) Language Models 🧠 have strong text comprehension/generation skills
  • their knowledge is generic and is not easily updated over time
  • When building NLP applications, we can combine LM with 🔎 Retrieval systems to provide new/specific knowledge and make them answer factually!

Resources