Skip to content
forked from Leeroo-AI/mergoo

A library for easily merging multiple LLM experts, and efficiently train the merged LLM.

License

Notifications You must be signed in to change notification settings

tawawhite/mergoo

 
 

Repository files navigation

Mergoo Leeroo logo

made-with-python License: LPGLv3.0 Version

mergoo is a library for easily merging multiple LLM experts, and efficiently train the merged LLM. With mergoo, you can efficiently integrate the knowledge of different generic or domain-based LLM experts.

🚀 Features

  • Supports recent merging methods including Mixture-of-Experts and Layer-wise merging
  • Flexible merging choice for each layer
  • Base Models supported : Llama and Mistral
  • Trainers supported : 🤗 Trainer, SFTrainer
  • Device Supported: CPU, MPS, GPU
  • Training choices: Finetune Only Router of MoE layers, Fully fine-tuning of Merged LLM

If you like the project, consider leaving a ⭐️

Installation

Install by pip:

pip install mergoo

Install latest unstable version on Github:

pip install git+https://github.com/Leeroo-AI/mergoo

Install it from the source:

git clone https://github.com/Leeroo-AI/mergoo
cd mergoo
pip install -e .

Quick Start

Merging Models
A sample usage of config and create the merged model

import torch
from mergoo.compose_experts import ComposeExperts

model_id = "data/mistral-math-code-moe"
config = {
    "model_type": "mistral",
    "num_experts_per_tok": 2,
    "experts": [
        {"expert_name": "base_expert", "model_id": "mistralai/Mistral-7B-v0.1"},
        {"expert_name": "expert_1", "model_id": "meta-math/MetaMath-Mistral-7B"},
        {"expert_name": "expert_2", "model_id": "ajibawa-2023/Code-Mistral-7B"}
    ],
    "router_layers": ["gate_proj", "up_proj", "down_proj"]
}

# create checkpoint
expertmerger = ComposeExperts(config, torch_dtype=torch.float16)
expertmerger.compose()
expertmerger.save_checkpoint(model_id)

Loading / Finetunning Merged models

from transformers import Trainer
from mergoo.models.modeling_mistral import MistralForCausalLM

model = MistralForCausalLM.from_pretrained("data/mistral-math-code-moe") 
# NOTE: 'gate' / router layers are untrained hence weight loading warning would appeare for them

trainer = Trainer( ... )
trainer.train()

📚 Learn More:

After finishing the Quick Start guide, you can explore the tutorials below to further familiarize yourself with mergoo.

Notebook Details
Unified MoE with Domain Experts Build a unifined Mixture-of-Experts model with domain-based LLM experts, inspired by BTX Research.

Mergoo Roadmap and Contributing

As an open-source library in a fast evolving domain, we welcome contributions, whether it is introducing new features, enhancing infrastructure, or improving documentation.

Here is mergoo roadmap:

  • Support MoE for Transformer Block
  • Compatibility with Huggingface 🤗
  • Support Trainer, SFTrainer
  • Loading Unified Checkpoint in BTX
  • Feature: Convertible QKV linear layers
  • Feature: Convertible FF linear layers
  • Feature: Routers only for a list of decoder layers indexes
  • Sharded Safetensor Saving
  • Support experts based on LLaMa and Mistral
  • Router Load balancing loss
  • Lazy loading of tensors for low memory usage in Merging
  • Support Mixture of LORA Expert ( Base model with multiple trained LORAs)
  • Support Layer-wise merging, including Mergekit
  • Support experts based on Gemma and Mamba
  • Support flash-attention
  • Support Mixture of Depths Transformer

Feel free to suggest new features and/or contribute to mergoo roadmap!

Join our community!

🚀 We love to here your feedback, please join Leeroo community:

Have a question not listed here? Open a GitHub Issue or send us an email!

About

A library for easily merging multiple LLM experts, and efficiently train the merged LLM.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 93.7%
  • Jupyter Notebook 6.3%