Skip to content

Commit

Permalink
add WildChat paper
Browse files Browse the repository at this point in the history
  • Loading branch information
mlabonne committed May 4, 2024
1 parent e4a8ecd commit 6c1205e
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ The goal of general-purpose datasets is to transform base models into versatile
| [Bagel](https://github.com/jondurbin/bagel) | >2M? | Jon Durbin | Jan 2024 | Collection of datasets decontaminated with cosine similarity. |
| [Hercules v4.5](https://huggingface.co/datasets/Locutusque/hercules-v4.5) | 1.72M | Sebastian Gabarain | Apr 2024 | Large-scale general-purpose dataset with math, code, RP, etc. See [v4](https://huggingface.co/datasets/Locutusque/hercules-v4.0) for the list of datasets. |
| [Dolphin-2.9](https://huggingface.co/datasets/cognitivecomputations/Dolphin-2.9) | 1.39M | Cognitive Computations | Apr 2023 | Large-scale general-purpose dataset used by the Dolphin models. |
| [WildChat-1M](https://huggingface.co/datasets/allenai/WildChat-1M) | 1.04M | Zhao et al. | May 2023 | Real conversations between human users and GPT-3.5/4, including demographic data, including state, country, hashed IP addresses, and request headers. |
| [WildChat-1M](https://huggingface.co/datasets/allenai/WildChat-1M) | 1.04M | Zhao et al. | May 2023 | Real conversations between human users and GPT-3.5/4, including metadata. See the [WildChat paper](https://arxiv.org/abs/2405.01470). |
| [OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5) | 1M | Teknium | Nov 2023 | Another large-scale dataset used by the OpenHermes models. |
| [SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca) | 518k | Lian et al. | Sep 2023 | Curated subset of [OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca) using GPT-4-as-a-judge to remove wrong answers. |
| [Tulu V2 Mix](https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture) | 326k | Ivison et al. | Nov 2023 | Mix of high-quality datasets. See [Tulu 2 paper](https://arxiv.org/abs/2311.10702). |
Expand Down

0 comments on commit 6c1205e

Please sign in to comment.