add WildChat

mlabonne · May 4, 2024 · e4a8ecd · e4a8ecd
1 parent 4463bf6
commit e4a8ecd
Showing 1 changed file with 3 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -31,8 +31,9 @@ The goal of general-purpose datasets is to transform base models into versatile
 | Dataset | # | Authors | Date | Notes |
 | ------------------------------------------------------------------------------------------------------------- | ----- | ---------------------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | [Bagel](https://github.com/jondurbin/bagel) | >2M? | Jon Durbin | Jan 2024 | Collection of datasets decontaminated with cosine similarity. |
-| [Hercules v4.5](https://huggingface.co/datasets/Locutusque/hercules-v4.5) | 1.72M | Sebastian Gabarain | Apr 2024 | Large-scale general-purpose dataset with math, code, RP, etc. See [v4](https://huggingface.co/datasets/Locutusque/hercules-v4.0) for the list of datasets. |
-| [Dolphin-2.9](https://huggingface.co/datasets/cognitivecomputations/Dolphin-2.9) | 1.39M | Cognitive Computations | Apr 2023 | Large-scale general-purpose dataset used by the Dolphin models. |
+| [Hercules v4.5](https://huggingface.co/datasets/Locutusque/hercules-v4.5) | 1.72M | Sebastian Gabarain | Apr 2024 | Large-scale general-purpose dataset with math, code, RP, etc. See [v4](https://huggingface.co/datasets/Locutusque/hercules-v4.0) for the list of datasets. |
+| [Dolphin-2.9](https://huggingface.co/datasets/cognitivecomputations/Dolphin-2.9) | 1.39M | Cognitive Computations | Apr 2023 | Large-scale general-purpose dataset used by the Dolphin models. |
+| [WildChat-1M](https://huggingface.co/datasets/allenai/WildChat-1M) | 1.04M | Zhao et al. | May 2023 | Real conversations between human users and GPT-3.5/4, including demographic data, including state, country, hashed IP addresses, and request headers. |
 | [OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5) | 1M | Teknium | Nov 2023 | Another large-scale dataset used by the OpenHermes models. |
 | [SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca) | 518k | Lian et al. | Sep 2023 | Curated subset of [OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca) using GPT-4-as-a-judge to remove wrong answers. |
 | [Tulu V2 Mix](https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture) | 326k | Ivison et al. | Nov 2023 | Mix of high-quality datasets. See [Tulu 2 paper](https://arxiv.org/abs/2311.10702). |