Training

Supervised Fine-tuning

For efficient LLM fine-tuning, we use low-rank adaptation (LoRA) from 🤗 Hugging Face's PEFT library. This involves freezing the base model's parameters and introducing a small number of learnable parameters.

For those with limited GPU memory, it's recommended to quantize certain computations to 8-bit or 4-bit precision using LLM.int8() or QLoRA. Note that this might result in a slight training slowdown compared to the fp16 or bf16 versions.

Tools like DeepSpeed or FSDP are highly recommended for distributed learning. FlashAttention is essential for speeding up training and reducing memory usage with long sequences.

More examples can be found in examples.

Since version 2.2, I've refactored the training code, integrating specific elements inspired by the excellent training framework Axolotl. Thanks to the Axolotl team for their contributions to the open-source community! The primary motivation behind maintaining my own framework is to have full control over the entire training process and customize it to my specific needs. I highly recommend using Axolotl for additional features.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training.md

training.md

Training

Supervised Fine-tuning

Files

training.md

Latest commit

History

training.md

File metadata and controls

Training

Supervised Fine-tuning