diff --git a/README.md b/README.md
index 6961ab0..7ed0df5 100644
--- a/README.md
+++ b/README.md
@@ -30,21 +30,22 @@ The goal of general-purpose datasets is to transform base models into versatile
 
 | Dataset                                                                                                       | #     | Authors                      | Date     | Notes                                                                             |
 | ------------------------------------------------------------------------------------------------------------- | ----- | ---------------------------- | -------- | --------------------------------------------------------------------------------- |
-| 🆕 [Buzz](https://huggingface.co/datasets/H-D-T/Buzz)                          | 31.2M  | Alignment Lab AI         | May 2024 | Huge collection of 435 datasets with data augmentation, deduplication, and other techniques. |
-| 🆕 [WebInstructSub](https://huggingface.co/datasets/chargoddard/WebInstructSub-prometheus)  | 2.39M  | Yue et al.              | May 2024 | Instructions created by retrieving document from Common Crawl, extracting QA pairs, and refining them. See the [MAmmoTH2 paper](https://arxiv.org/abs/2405.03548) (this is a subset). |
-| [Bagel](https://github.com/jondurbin/bagel)                                 | >2M?  | Jon Durbin              | Jan 2024 | Collection of datasets decontaminated with cosine similarity. |
-| [Hercules v4.5](https://huggingface.co/datasets/Locutusque/hercules-v4.5)   | 1.72M | Sebastian Gabarain      | Apr 2024 | Large-scale general-purpose dataset with math, code, RP, etc. See [v4](https://huggingface.co/datasets/Locutusque/hercules-v4.0) for the list of datasets. |
+| [Buzz](https://huggingface.co/datasets/H-D-T/Buzz)                          | 31.2M  | Alignment Lab AI | May 2024 | Huge collection of 435 datasets with data augmentation, deduplication, and other techniques. |
+| [WebInstructSub](https://huggingface.co/datasets/chargoddard/WebInstructSub-prometheus)  | 2.39M  | Yue et al. | May 2024 | Instructions created by retrieving document from Common Crawl, extracting QA pairs, and refining them. See the [MAmmoTH2 paper](https://arxiv.org/abs/2405.03548) (this is a subset). |
+| [Bagel](https://github.com/jondurbin/bagel)                                 | >2M?  | Jon Durbin | Jan 2024 | Collection of datasets decontaminated with cosine similarity. |
+| [Hercules v4.5](https://huggingface.co/datasets/Locutusque/hercules-v4.5)   | 1.72M | Sebastian Gabarain | Apr 2024 | Large-scale general-purpose dataset with math, code, RP, etc. See [v4](https://huggingface.co/datasets/Locutusque/hercules-v4.0) for the list of datasets. |
 | [Dolphin-2.9](https://huggingface.co/datasets/cognitivecomputations/Dolphin-2.9) | 1.39M | Cognitive Computations | Apr 2023 | Large-scale general-purpose dataset used by the Dolphin models. |
-| [WildChat-1M](https://huggingface.co/datasets/allenai/WildChat-1M)          | 1.04M    | Zhao et al.                      | May 2023 | Real conversations between human users and GPT-3.5/4, including metadata. See the [WildChat paper](https://arxiv.org/abs/2405.01470). |
-| [OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5)    | 1M    | Teknium                      | Nov 2023 | Another large-scale dataset used by the OpenHermes models.                                                                                                                                              |
-| [SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca)              | 518k  | Lian et al.                  | Sep 2023 | Curated subset of [OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca) using GPT-4-as-a-judge to remove wrong answers.                                                                        |
-| [Tulu V2 Mix](https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture)  | 326k  | Ivison et al.                | Nov 2023 | Mix of high-quality datasets. See [Tulu 2 paper](https://arxiv.org/abs/2311.10702).                                                                                                                     |
-| [UltraInteract SFT](https://huggingface.co/datasets/openbmb/UltraInteract_sft) | 289k  | Yuan et al.                  | Apr 2024 | Focus on math, coding, and logic tasks with step-by-step answers. See [Eurus paper](https://arxiv.org/abs/2404.02078).                                                                                  |
-| [NeurIPS-LLM-data](https://huggingface.co/datasets/upaya07/NeurIPS-LLM-data)                                  | 204k  | Jindal et al.                | Nov 2023 | Winner of [NeurIPS LLM Efficiency Challenge](https://llm-efficiency-challenge.github.io/), with an interesting data preparation strategy.                                                               |
-| [UltraChat 200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k)                                | 200k  | Tunstall et al., Ding et al. | Oct 2023 | Heavily filtered version of the [UItraChat](https://github.com/thunlp/UltraChat) dataset, consisting of 1.4M dialogues generated by ChatGPT.                                                            |
-| [WizardLM_evol_instruct_V2](https://huggingface.co/datasets/mlabonne/WizardLM_evol_instruct_v2_196K-ShareGPT) | 143k  | Xu et al.                    | Jun 2023 | Latest version of Evol-Instruct applied to Alpaca and ShareGPT data. See [WizardLM paper](https://arxiv.org/abs/2304.12244).                                                                            |
-| [sft_datablend_v1](https://huggingface.co/datasets/nvidia/sft_datablend_v1)                                   | 128k  | NVIDIA                       | Jan 2024 | Blend of publicly available datasets: OASST, CodeContests, FLAN, T0, Open_Platypus, and GSM8K and others (45 total).                                                                                    |
-| [Synthia-v1.3](https://huggingface.co/datasets/migtissera/Synthia-v1.3)                                       | 119k  | Migel Tissera                | Nov 2023 | High-quality synthetic data generated using GPT-4.                                                                                                                                                      |
+| [WildChat-1M](https://huggingface.co/datasets/allenai/WildChat-1M)          | 1.04M  | Zhao et al. | May 2023 | Real conversations between human users and GPT-3.5/4, including metadata. See the [WildChat paper](https://arxiv.org/abs/2405.01470). |
+| [OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5)    | 1M    | Teknium | Nov 2023 | Another large-scale dataset used by the OpenHermes models. |
+| [SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca)              | 518k  | Lian et al. | Sep 2023 | Curated subset of [OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca) using GPT-4-as-a-judge to remove wrong answers. |
+| [Tulu V2 Mix](https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture)  | 326k  | Ivison et al. | Nov 2023 | Mix of high-quality datasets. See [Tulu 2 paper](https://arxiv.org/abs/2311.10702). |
+| 🆕 [Magpie-Pro](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-300K-Filtered)  | 300k  | Xu et al. | Jun 2024 | High-quality samples directly extracted from Llama 3 70B Instruct via a new technique. See [Magpie paper](https://arxiv.org/abs/2406.08464). |
+| [UltraInteract SFT](https://huggingface.co/datasets/openbmb/UltraInteract_sft) | 289k  | Yuan et al. | Apr 2024 | Focus on math, coding, and logic tasks with step-by-step answers. See [Eurus paper](https://arxiv.org/abs/2404.02078). |
+| [NeurIPS-LLM-data](https://huggingface.co/datasets/upaya07/NeurIPS-LLM-data)   | 204k  | Jindal et al. | Nov 2023 | Winner of [NeurIPS LLM Efficiency Challenge](https://llm-efficiency-challenge.github.io/), with an interesting data preparation strategy. |
+| [UltraChat 200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) | 200k  | Tunstall et al., Ding et al. | Oct 2023 | Heavily filtered version of the [UItraChat](https://github.com/thunlp/UltraChat) dataset, consisting of 1.4M dialogues generated by ChatGPT. |
+| [WizardLM_evol_instruct_V2](https://huggingface.co/datasets/mlabonne/WizardLM_evol_instruct_v2_196K-ShareGPT) | 143k  | Xu et al. | Jun 2023 | Latest version of Evol-Instruct applied to Alpaca and ShareGPT data. See [WizardLM paper](https://arxiv.org/abs/2304.12244). |
+| [sft_datablend_v1](https://huggingface.co/datasets/nvidia/sft_datablend_v1)  | 128k  | NVIDIA | Jan 2024 | Blend of publicly available datasets: OASST, CodeContests, FLAN, T0, Open_Platypus, and GSM8K and others (45 total). |
+| [Synthia-v1.3](https://huggingface.co/datasets/migtissera/Synthia-v1.3)  | 119k  | Migel Tissera | Nov 2023 | High-quality synthetic data generated using GPT-4. |
 | [FuseChat-Mixture](https://huggingface.co/datasets/FuseAI/FuseChat-Mixture)                                   | 95k   | Wan et al.                   | Feb 2024 | Selection of samples from high-quality datasets. See [FuseChat paper](https://arxiv.org/abs/2402.16107).                                                                                                |
 | [oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1)                                                | 84.4k | Köpf et al.                  | Mar 2023 | Human-generated assistant-style conversation corpus in 35 different languages. See [OASST1 paper](https://arxiv.org/abs/2304.07327) and [oasst2](https://huggingface.co/datasets/OpenAssistant/oasst2). |
 | [WizardLM_evol_instruct_70k](https://huggingface.co/datasets/mlabonne/WizardLM_evol_instruct_70k-ShareGPT) | 70k   | Xu et al.                    | Apr 2023 | Evol-Instruct applied to Alpaca and ShareGPT data. See [WizardLM paper](https://arxiv.org/abs/2304.12244).                                                                                              |