Skip to content

Commit

Permalink
fix order
Browse files Browse the repository at this point in the history
  • Loading branch information
mlabonne committed Apr 29, 2024
1 parent b70f450 commit 4851d1b
Showing 1 changed file with 6 additions and 2 deletions.
8 changes: 6 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,10 +57,10 @@ LLMs often struggle with mathematical reasoning and formal logic, which has led

| Dataset | # | Authors | Date | Notes |
| ----------------------------------------------------------------------------------- | ---- | ------------ | -------- | -------------------------------------------------------------------------------------------------------------------------------------- |
| [OpenMathInstruct-1](https://huggingface.co/datasets/nvidia/OpenMathInstruct-1) | 5.75M | Toshniwal et al. | Feb 2024 | Problems from GSM8K and MATH, solutions generated by Mixtral-8x7B |
| [MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA) | 395k | Yu et al. | Dec 2023 | Bootstrap mathematical questions by rewriting them from multiple perspectives. See [MetaMath paper](https://arxiv.org/abs/2309.12284). |
| [MathInstruct](https://huggingface.co/datasets/TIGER-Lab/MathInstruct) | 262k | Yue et al. | Sep 2023 | Compiled from 13 math rationale datasets, six of which are newly curated, and focuses on chain-of-thought and program-of-thought. |
| [Orca-Math](https://huggingface.co/datasets/microsoft/orca-math-word-problems-200k) | 200k | Mitra et al. | Feb 2024 | Grade school math world problems generated using GPT4-Turbo. See [Orca-Math paper](https://arxiv.org/pdf/2402.14830.pdf). |
| [OpenMathInstruct-1](https://huggingface.co/datasets/nvidia/OpenMathInstruct-1) | 5.75M | Toshniwal et al.<br>(NVIDIA) | Feb 2024 | Problems from GSM8K and MATH, solutions generated by Mixtral-8x7B |

### Code

Expand Down Expand Up @@ -97,8 +97,8 @@ Function calling allows large language models (LLMs) to execute predefined funct

| Dataset | # | Authors | Date | Notes |
| ------------------------------------------------------------------------------------------------- | ----- | --------------- | -------- | ----------------------------------------------------------------------------------- |
| [Agent-FLAN](https://huggingface.co/datasets/internlm/Agent-FLAN) | 34.4k | internlm | Mar 2024 | Mix of AgentInstruct, ToolBench, and ShareGPT datasets. |
| [glaive-function-calling-v2](https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2) | 113k | Sahil Chaudhary | Sep 2023 | High-quality dataset with pairs of instructions and answers in different languages. <br>See [Locutusque/function-calling-chatml](https://huggingface.co/datasets/Locutusque/function-calling-chatml) for a variant without conversation tags. |
| [Agent-FLAN](https://huggingface.co/datasets/internlm/Agent-FLAN) | 34.4k | internlm | Mar 2024 | Mix of AgentInstruct, ToolBench, and ShareGPT datasets. |

## ⚖️ Preference alignment

Expand Down Expand Up @@ -135,6 +135,10 @@ Start by aggregating available data from various sources (open-source or not) an
* [**Bonito**](https://github.com/BatsResearch/bonito): Library for generating synthetic instruction tuning datasets for your data without GPT (see also [AutoBonito](https://colab.research.google.com/drive/1l9zh_VX0X4ylbzpGckCjH5yEflFsLW04?usp=sharing)).
* [**Augmentoolkit**](https://github.com/e-p-armstrong/augmentoolkit): Framework to convert raw text into datasets using open-source and closed-source models.

## Acknowledgments

Special thanks to [geronimi73](https://github.com/geronimi73) for the PR.

## References

- Wei-Lin Chiang et al, "Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality," 2023.
Expand Down

0 comments on commit 4851d1b

Please sign in to comment.