Releases · pytorch/torchtune

18 Sep 01:57

v0.3.0

651a730

Overview

We haven’t had a new release for a little while now, so there is a lot in this one. Some highlights include FSDP2 recipes for full finetune and LoRA(/QLoRA), support for DoRA fine-tuning, a PPO recipe for RLHF, Qwen2 models of various sizes, a ton of improvements to memory and performance (try our recipes with torch compile! try our sample packing with flex attention!), and Comet ML integration. For the full set of perf and memory improvements, we recommend installing with the PyTorch nightlies.

New Features

Here are highlights of some of our new features in 0.3.0.

Recipes

Full finetune FSDP2 recipe (#1287)
LoRA FSDP2 recipe with faster training than FSDP1 (#1517)
RLHF with PPO (#1005)
DoRA (#1115)
SimPO (#1223)

Models

Qwen2 0.5B, 1.5B, 7B model (#1143, #1247)
Flamingo model components (#1357)
CLIP encoder and vision transform (#1127)

Perf, memory, and quantization

Per-layer compile: 90% faster compile time and 75% faster training time (#1419)
Sample packing with flex attention: 80% faster training time with compile vs unpacked (#1193)
Chunked cross-entropy to reduce peak memory (#1390)
Make KV cache optional (#1207)
Option to save adapter checkpoint only (#1220)
Delete logits before bwd, saving ~4 GB (#1235)
Quantize linears without LoRA applied to NF4 (#1119)
Compile model and loss (#1296, #1319)
Speed up QLoRA initialization (#1294)
Set LoRA dropout to 0.0 to save memory (#1492)

Data/Datasets

Multimodal datasets: The Cauldron and LLaVA-Instruct-150K (#1158)
Multimodal collater (#1156)
Tokenizer redesign for better model-specific feature support (#1082)
Create general SFTDataset combining instruct and chat (#1234)
Interleaved image support in tokenizers (#1138)
Image transforms for CLIP encoder (#1084)
Vision cross-attention mask transform (#1141)
Support images in messages (#1504)

Miscellaneous

Deep fusion modules (#1338)
CometLogger integration (#1221)
Add profiler to full finetune recipes (#1288)
Support memory viz tool through the profiler (#1382, #1384)
Add RSO loss (#1197)
Add support for non-incremental decoding (#973)
Move utils directory to training (#1432, #1519, …)
Add bf16 dtype support on CPU (#1218)
Add grad norm logging (#1451)

Documentation

QAT tutorial (#1105)
Recipe docs pages and memory optimizations tutorial (#1230)
Add download commands to model API docs (#1167)
Updates to utils API docs (#1170)

Bug Fixes

Prevent pad ids, special tokens displaying in generate (#1211)
Reverting Gemma checkpoint logic causing missing head weight (#1168)
Fix compile on PyTorch 2.4 (#1512)
Fix Llama 3.1 RoPE init for compile (#1544)
Fix checkpoint load for FSDP2 with CPU offload (#1495)
Add missing quantization to Llama 3.1 layers (#1485)
Fix accuracy number parsing in Eleuther eval test (#1135)
Allow adding custom system prompt to messages (#1366)
Cast DictConfig -> dict in instantiate (#1450)

New Contributors (Auto generated by Github)

@sanchitintel made their first contribution in #1218
@lulmer made their first contribution in #1134
@stsouko made their first contribution in #1238
@spider-man-tm made their first contribution in #1220
@winglian made their first contribution in #1119
@fyabc made their first contribution in #1143
@mreso made their first contribution in #1274
@gau-nernst made their first contribution in #1288
@lucylq made their first contribution in #1269
@dzheng256 made their first contribution in #1221
@ChinoUkaegbu made their first contribution in #1310
@janeyx99 made their first contribution in #1382
@Gasoonjia made their first contribution in #1385
@shivance made their first contribution in #1417
@yf225 made their first contribution in #1419
@thomasjpfan made their first contribution in #1363
@AnuravModak made their first contribution in #1429
@lindawangg made their first contribution in #1451
@andrewldesousa made their first contribution in #1470
@mirceamironenco made their first contribution in #1523
@mikaylagawarecki made their first contribution in #1315

Contributors

winglian, stsouko, and 19 other contributors

Assets 2

25 Jul 19:51

joecummings

v0.2.1

83557aa

v0.2.1 (llama3.1 patch)

Overview

This patch includes support for fine-tuning Llama3.1 with torchtune as well as various improvements to the library.

New Features & Improvements

Models

Added support for Llama3.1 (#1208)

Modules

Tokenizer refactor to improve the extensibility of our tokenizer components (#1082)

Assets 2

16 Jul 16:26

pbontrager

v0.2.0

95f8a47

v0.2.0

Overview

It’s been awhile since we’ve done a release and we have a ton of cool, new features in the torchtune library including distributed QLoRA support, new models, sample packing, and more! Checkout #new-contributors for an exhaustive list of new contributors to the repo.

Enjoy the new release and happy tuning!

New Features

Here’s some highlights of our new features in v0.2.0.

Recipes

We added support for QLoRA with FSDP2! This means users can now run 70B+ models on multiple GPUs. We provide example configs for Llama2 7B and 70B sizes. Note: this currently requires you to install PyTorch nightlies to access the FSDP2 methods. (#909)
Also by leveraging FSDP2, we see a speed up of 12% tokens/sec and a 3.2x speedup in model init over FSDP1 with LoRA (#855)
We added support for other variants of the Meta-Llama3 recipes including:
- 70B with LoRA (#802)
- 70B full finetune (#993)
- 8B memory-efficient full finetune which saves 46% peak memory over previous version (#990)
We introduce a quantization-aware training (QAT) recipe. Training with QAT shows significant improvement in model quality if you plan on quantizing your model post-training. (#980)
torchtune made updates to the eval recipe including:
- Batched inference for faster eval (#947)
- Support for free generation tasks in EleutherAI Eval Harness (#975)
- Support for custom eval configs (#1055)

Models

Phi-3 Mini-4K-Instruct from Microsoft (#876)
Gemma 7B from Google (#971)
Code Llama2: 7B, 13B, and 70B sizes from Meta (#847)
@salman designed and implemented reward modeling for Mistral models (#840, #991)

Perf, memory, and quantization

We made improvements to our FSDP + Llama3 recipe, resulting in 13% more savings in allocated memory for the 8B model. (#865)
Added Int8 per token dynamic activation + int4 per axis grouped weight (8da4w) quantization (#884)

Data/Datasets

We added support for a widely requested feature - sample packing! This feature drastically speeds up model training - e.g. 2X faster with the alpaca dataset. (#875, #1109)
In addition to our instruct tuning, we now also support continued pretraining and include several example datasets like wikitext and CNN DailyMail. (#868)
Users can now train on multiple datasets using concat datasets (#889)
We now support OpenAI conversation style data (#890)

Miscellaneous

@jeromeku added a much more advanced profiler so users can understand the exact bottlenecks in their LLM training. (#1089)
We made several metric logging improvements:
- Log tokens/sec, per-step logging, configurable memory logging (#831)
- Better formatting for stdout memory logs (#817)
Users can now save models in a safetensor format. (#1096)
Updated activation checkpointing to support selective layer and selective op activation checkpointing (#785)
We worked with the Hugging Face team to provide support for loading adapter weights fine tuned via torchtune directly into the PEFT library. (#933)

Documentation

We wrote a new tutorial for fine-tuning Llama3 with chat data (#823) and revamped the datasets tutorial (#994)
Looooooooong overdue, but we added proper documentation for the tune CLI (#1052)
Improved contributing guide (#896)

Bug Fixes

@Optimox found and fixed a bug to ensure that LoRA dropout was correctly applied (#996)
Fixed a broken link for Llama3 tutorial in #805
Fixed Gemma model generation (#1016)
Bug workaround: to download CNN DailyMail, launch a single device recipe first and once it’s downloaded you can use the dataset for distributed recipes.

New Contributors

@supernovae made their first contribution in #803
@eltociear made their first contribution in #814
@Carolinabanana made their first contribution in #810
@musab-mk made their first contribution in #818
@apthagowda97 made their first contribution in #816
@lessw2020 made their first contribution in #785
@weifengpy made their first contribution in #843
@musabgultekin made their first contribution in #857
@xingyaoww made their first contribution in #890
@vmoens made their first contribution in #902
@andrewor14 made their first contribution in #884
@kunal-mansukhani made their first contribution in #926
@EvilFreelancer made their first contribution in #889
@water-vapor made their first contribution in #950
@Optimox made their first contribution in #995
@tambulkar made their first contribution in #1011
@christobill made their first contribution in #1004
@j-dominguez9 made their first contribution in #1056
@andyl98 made their first contribution in #1061
@hmosousa made their first contribution in #1065
@yasser-sulaiman made their first contribution in #1055
@parthsarthi03 made their first contribution in #1081
@mdeff made their first contribution in #1086
@jeffrey-fong made their first contribution in #1096
@jeromeku made their first contribution in #1089
@man-shar made their first contribution in #1126

Full Changelog: v0.1.1...v0.2.0

Contributors

$@RefractAI$

salman, supernovae, and 25 other contributors

Assets 2

18 Apr 18:51

joecummings

v0.1.1

02847ec

v0.1.1 (llama3 patch)

Overview

This patch includes support for fine-tuning Llama3 with torchtune as well as various improvements to the library.

New Features & Improvements

Recipes

Added configuration for Llama2 13B QLoRA (#779)
Added support for Llama2 70B LoRA (#788)

Models

Added support for Llama3 (#793)

Utils

Improvements to Weights & Biases logger (#772, #777)

Documentation

Added Llama3 tutorial (#793)
Updated E2E tutorial with instructions for uploading to the Hugging Face Hub (#773)
Updates to the README (#775, #778, #786)
Added instructions for installing torchtune nightly (#792)

Assets 2

16 Apr 01:57

joecummings

v0.1.0

9e2c9c0

torchtune v0.1.0 (first release)

Overview

We are excited to announce the release of torchtune v0.1.0! torchtune is a PyTorch library for easily authoring, fine-tuning and experimenting with LLMs. The library emphasizes 4 key aspects:

Simplicity and Extensibility. Native-PyTorch, componentized design and easy-to-reuse abstractions
Correctness. High bar on proving the correctness of components and recipes
Stability. PyTorch just works. So should torchtune
Democratizing LLM fine-tuning. Works out-of-the-box on both consumer and professional hardware setups

torchtune is tested with the latest stable PyTorch release (2.2.2) as well as the preview nightly version.

New Features

Here are a few highlights of new features from this release.

Recipes

Added support for running a LoRA finetune using a single GPU (#454)
Added support for running a QLoRA finetune using a single GPU (#478)
Added support for running a LoRA finetune using multiple GPUs with FSDP (#454, #266)
Added support for running a full finetune using a single GPU (#482)
Added support for running a full finetune using multiple GPUs with FSDP (#251, #482)
Added WIP support for DPO (#645)
Integrated with EleutherAI Eval Harness for an evaluation recipe (#549)
Added support for quantization through integration with torchao (#632)
Added support for single-GPU inference (#619)
Created a config parsing system to interact with recipes through YAML and the command line (#406, #456, #468)

Models

Added support for Llama2 7B (#70, #137) and 13B (#571)
Added support for Mistral 7B (#571)
Added support for Gemma [WIP] (#630, #668)

Datasets

Added support for instruction and chat-style datasets (#752, #624)
Included example implementations of datasets (#303, #116, #407, #541, #576, #645)
Integrated with Hugging Face Datasets (#70)

Utils

Integrated with Weights & Biases for metric logging (#162, #660)
Created a checkpointer to handle model files from HF and Meta (#442)
Added a tune CLI tool (#396)

Documentation

In addition to documenting torchtune’s public facing APIs, we include several new tutorials and “deep-dives” in our documentation.

Added LoRA tutorial (#368)
Added “End-to-End Workflow with torchtune” tutorial (#690)
Added datasets tutorial (#735)
Added QLoRA tutorial (#693)
Added deep-dive on the checkpointer (#674)
Added deep-dive on configs (#311)
Added deep-dive on recipes (#316)
Added deep-dive on Weights & Biases integration (#660)

Community Contributions

This release of torchtune features some amazing work from the community:

Gemma 2B model from @solitude-alive (#630)
DPO finetuning recipe from @yechenzhi (#645)
Weights & Biases updates from @tcapelle (#660)

Contributors

yechenzhi, tcapelle, and solitude-alive

Assets 2

Releases: pytorch/torchtune

v0.3.0

Overview

New Features

Recipes

Models

Perf, memory, and quantization

Data/Datasets

Miscellaneous

Documentation

Bug Fixes

New Contributors (Auto generated by Github)

Contributors

v0.2.1 (llama3.1 patch)

Overview

New Features & Improvements

Models

Modules

v0.2.0

Overview

New Features

Recipes

Models

Perf, memory, and quantization

Data/Datasets

Miscellaneous

Documentation

Bug Fixes

New Contributors

Contributors

v0.1.1 (llama3 patch)

Overview

New Features & Improvements

Recipes

Models

Utils

Documentation

torchtune v0.1.0 (first release)

Overview

New Features

Recipes

Models

Datasets

Utils

Documentation

Community Contributions

Contributors