Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TUTORIAL] Create a synthetic dataset with Mistral and distilabel #35

Merged
merged 11 commits into from
Jun 12, 2024

Conversation

sdiazlor
Copy link
Contributor

@sdiazlor sdiazlor commented May 22, 2024

In this tutorial, we will generate some instructions using the self-instruct approach and then will generate two possible answers using MistralAI models. Then, we will use a higher-level model to judge the answers (mistral-large). Finally, we will use the argilla package to analyze the dataset and push it to HF.

  • Runnable in Colab
  • Added to README

Thanks in advance @sophiamyang

Copy link

@gabrielmbmb gabrielmbmb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for pulling this @sdiazlor

distilabel_synthetic_dpo_dataset.ipynb Outdated Show resolved Hide resolved
@sophiamyang
Copy link
Collaborator

sophiamyang commented Jun 10, 2024

Hi @sdiazlor, thanks for the PR! Could you add your notebook to the third-party folder https://github.com/mistralai/cookbook/tree/main/third_party?

@pandora-s-git
Copy link
Collaborator

Hi there ! Honnest question, what exactly is the advantage here of using 2 smaller models and judge the answers with Mistral Large? Is it really cheaper or/and better than just generating with mistral large directly a dataset without a Judge? I do understand that for DPO you are required to have two possible answers and get the best one, but will this dataset be better than one generated directly by Large?

@sdiazlor
Copy link
Contributor Author

sdiazlor commented Jun 11, 2024

@pandora-s-git Thanks for your question. The main issue is that DPO highly helps the model to generate answers that are aligned with people's preferences. A basic QA dataset does not provide insights into the model of what's better or worse, and diversity is reduced. That said, using two small models usually is cheaper, sometimes free if they are open source, and their answers usually have a good quality, the larger model here will help not only to create the DPO dataset but also for the annotators (which is highly recommended to obtain a better high-quality dataset) to make their decisions. However, other alignment approaches have also appeared and they require less data as KTO or DOVE. If you want, this blog may be interesting for you.

@sdiazlor
Copy link
Contributor Author

@sophiamyang Notebook moved to the third_party folder.

@pandora-s-git
Copy link
Collaborator

@pandora-s-git Thanks for your question. The main issue is that DPO highly helps the model to generate answers that are aligned with people's preferences. A basic QA dataset does not provide insights into the model of what's better or worse, and diversity is reduced. That said, using two small models usually is cheaper, sometimes free if they are open source, and their answers usually have a good quality, the larger model here will help not only to create the DPO dataset but also for the annotators (which is highly recommended to obtain a better high-quality dataset) to make their decisions. However, other alignment approaches have also appeared and they require less data as KTO or DOVE. If you want, this blog may be interesting for you.

Yeah after some thinking on my own I figured it actually made sense seeing how DPO usually works, thanks for the blog tho will check it out!!

@sophiamyang sophiamyang merged commit fc21916 into mistralai:main Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants