Skip to content

LLM Agora, debating between open-source LLMs to refine the answers

Notifications You must be signed in to change notification settings

gauss5930/LLM-Agora

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Agora

LLM Agora is the place to debate between open-source LLMs and revise their responses!

The LLM Agora 🗣️🏦 aims to improve the quality of open-source LMs' responses through debate & revision introduced in Improving Factuality and Reasoning in Language Models through Multiagent Debate. We would like to thank the authors of the paper for their brilliant ideas that allowed me to pursue this project.

Do you know that? 🤔 LLMs can also improve their responses by debating with other LLMs! 😮 We tried to apply this concept to several open-source LMs to verify that the open-source model, not the proprietary one, can sufficiently improve the response through discussion. 🤗 For this, we developed LLM Agora! You can try the LLM Agora and check the example responses in LLM Agora Spaces!

We tried to follow the overall framework of llm_multiagent_debate, and we added additional things such as CoT. We could confirm that through the experiments of LLM Agora, although there are still shortcomings, open-source LLMs can also improve the quality of models' responses through multi-agent debate.

ToC

  1. Introduction & Motivation
  2. What is LLM Agora?
  3. Experiments
  4. Analysis
  5. Future work
  6. How to do?

Introduction & Motivation

The LLM Agora project is inspired by the multi-agent debate introduced in the paper 'Improving Factuality and Reasoning in Language Models through Multiagent Debate' as mentioned above. Therefore, before start introducing the LLM Agora, we would like to explain the concept of multiagent debate!

With the remarkable development of LLM, LLM has become capable of outputting responses at a significantly higher level. For example, GPT-4 is enough to pass even difficult exams. Despite the brilliant performance of proprietary LLMs, their first responses have some errors or mistakes. Then, how can correct and revise the responses? In the paper, they suggested that debate between several agents can revise the responses and improve the performance! Through several experiments, the fact that this method can correct the errors in responses and revise the quality of responses was proved. (If you want to know more, please check the official GitHub Page of paper!)

In the paper, the overall experiment is conducted using only one model, but in the Analysis part, it is said that a synergy effect that shows further improved performance can be seen when different types of LLM are used. The LLM Agora is exactly inspired from this point!

We started the LLM Agora project with the expectation that if several open-source LLMs create a synergy effect through debate between other models, we can expect an effect that can complement the shortcomings of open-source LLM, which still has some shortcomings. Therefore, we carried out the LLM Agora project because we thought it could be a groundbreaking method if multi-agent debate could improve the quality of responses of open-source LLMs.

What is LLM Agora?

The meaning of 'Agora' is a place where meetings were held in ancient Greece. We thought this meaning was similar to a multi-agent debate, so we named it LLM Agora. The summarized difference between multi-agent debate and LLM Agora is as follows:

  1. Models: Several open-source LLMs were utilized, unlike the paper that used proprietary LLM(ChatGPT). In addition, we analyzed whether using open-source LLM in multi-agent debate is effective or not, and used various models to check the synergy effect.
  2. Summarization: The concatenated response was used for the debate sentence in the paper. However, according to the experimental result of the paper, it is more effective to summarize the models' responses and use them as a debate sentence. Therefore, we summarized the models' responses with ChatGPT and used it as a debate sentence.
  3. Chain-of-Thought: We used Chain-of-Thought in a multi-agent debate to confirm whether open-source LLM can achieve performance improvement through Chain-of-Thought and to determine its impact on the debate.
  4. HuggingFace Space: We implemented LLM Agora in HuggingFace Space so that people can directly use LLM Agora and check the responses generated through experiments. It's open to everyone, so check it out! LLM Agora Space

We hope that LLM Agora will be used in the future as a way to improve the performance of open-source models as well as proprietary models. Once again, we would like to thank the authors of the 'Improving Factuality and Reasoning in Language Models through Multiagent Debate' for suggesting the idea of multiagent-debate.

Experiments