Skip to content
forked from Leeroo-AI/mergoo

A library for easily merging multiple LLM experts, and efficiently train the merged LLM.

License

Notifications You must be signed in to change notification settings

eltociear/mergoo

 
 

Repository files navigation

Mergoo Leeroo logo

made-with-python License: LPGLv3.0 Version

mergoo is a library for easily merging multiple LLM experts, and efficiently train the merged LLM. With mergoo, you can efficiently integrate the knowledge of different generic or domain-based LLM experts.

🚀 Features

  • Supports several merging methods: Mixture-of-Experts, Mixture-of-Adapters, and Layer-wise merging
  • Flexible merging for each layer
  • Base Models supported : Llama, Mistral, and BERT
  • Trainers supported : 🤗 Trainer, SFTrainer, PEFT
  • Device Supported: CPU, MPS, GPU
  • Training choices: Only Router of MoE layers, or Fully fine-tuning of Merged LLM

If you like the project, consider leaving a ⭐️

Installation

Install by pip:

pip install mergoo

Install latest unstable version on Github:

pip install git+https://github.com/Leeroo-AI/mergoo

Install it from the source:

git clone https://github.com/Leeroo-AI/mergoo
cd mergoo
pip install -e .

Quick Start

Configuration Setup

Specifiy the config for merging:

  • model_type: type of base model. choices: mistral, llama, or bert.
  • num_experts_per_token: Number of experts for each token of MoE.
  • experts: config for experts to merge. includes expert_name and Hugging Face 🤗model_id.
  • router_layers: layers chosen for applying Mixture-of-Experts.

Fully Fine-tuned Experts

This is a sample config when merging fully fine-tuned LLM experts.

config = {
    "model_type": "mistral",
    "num_experts_per_tok": 2,
    "experts": [
        {"expert_name": "base_expert", "model_id": "mistralai/Mistral-7B-v0.1"},
        {"expert_name": "expert_1", "model_id": "meta-math/MetaMath-Mistral-7B"},
        {"expert_name": "expert_2", "model_id": "ajibawa-2023/Code-Mistral-7B"}
    ],
    "router_layers": ["gate_proj", "up_proj", "down_proj"]
}

For the above example, we merged math and code mistral-based experts. Please refer to this notebook for further details!

Mixture of Adapters (MoE on LoRA)

This is a sample config when merging LoRA fine-tuned LLM experts. mergoo builds a routing layer on top of LoRAs, resulting in a mixture of adapters.

config = {
    "model_type": "mistral",
    "num_experts_per_tok": 2,
    "base_model": "mistralai/Mistral-7B-v0.1",
    "experts": [
        {"expert_name": "adapter_1", "model_id": "predibase/customer_support"},
        {"expert_name": "adapter_2", "model_id": "predibase/customer_support_accounts"},
        {"expert_name": "adapter_3", "model_id": "predibase/customer_support_orders"},
        {"expert_name": "adapter_4", "model_id": "predibase/customer_support_payments"}
    ],
}

The expert_name starts with adapter instead of expert. Please refer to this notebook for further details!

Merge Experts

Following the config setup, mergoo creates the merged LLM as:

import torch
from mergoo.compose_experts import ComposeExperts

# create checkpoint
model_id = "data/mistral_lora_moe"
expertmerger = ComposeExperts(config, torch_dtype=torch.float16)
expertmerger.compose()
expertmerger.save_checkpoint(model_id)

Load / Finetune Merged Expert

Now, you can easily train the merged LLM with Hugging Face Trainer:

from transformers import Trainer
from mergoo.models.modeling_mistral import MistralForCausalLM

model = MistralForCausalLM.from_pretrained("data/mistral_lora_moe") 
# NOTE: 'gate' / router layers are untrained hence weight loading warning would appeare for them

trainer = Trainer( ... )
trainer.train()

📚 Learn More:

After finishing the Quick Start guide, you can explore the tutorials below to further familiarize yourself with mergoo.

Notebook Details
MoE with fully fine-tuned LLM experts Build a unifined Mixture-of-Experts model with fully fine-tuned experts. Inspired by BTX Research (Meta AI).
MoE with LoRA fine-tuned experts Build a Mixture of Adaptes expert. Inspired by xlora | Mixture-of-LoRAs | MoLE | PHATGOOSE | MoELoRA
Hugging Face Blog Deep dive into research details behind the merging methods of mergoo library

Mergoo Roadmap and Contributing

As an open-source library in a fast evolving domain, we welcome contributions, whether it is introducing new features, enhancing infrastructure, or improving documentation.

Here is mergoo roadmap:

  • Support MoE for Transformer Block
  • Compatibility with Huggingface 🤗
  • Support Trainer, SFTrainer
  • Loading Unified Checkpoint in BTX
  • Feature: Convertible QKV linear layers
  • Feature: Convertible FF linear layers
  • Feature: Routers only for a list of decoder layers indexes
  • Sharded Safetensor Saving
  • Support experts based on LLaMa and Mistral
  • Router Load balancing loss
  • Lazy loading of tensors for low memory usage in Merging
  • Support Mixture of LORA Experts (Mixture of Adapters)
  • Support other Layer-wise merging methods, including Mergekit
  • Support experts based on Gemma and Mamba
  • Support flash-attention
  • Support Mixture of Depths Transformer

Feel free to suggest new features and/or contribute to mergoo roadmap!

Join our community!

🚀 We love to here your feedback, please join Leeroo community:

Have a question not listed here? Open a GitHub Issue or send us an email!

About

A library for easily merging multiple LLM experts, and efficiently train the merged LLM.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 89.0%
  • Jupyter Notebook 11.0%