Skip to content

mwbini/ether

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections (ICML2024)

Official PyTorch implementation of ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections, M. Bini, K. Roth, Z. Akata, A. Khoreva (ICML 2024)

Paper | Contact | Cite

TLDR: ETHER and its relaxation ETHER+ finetune pretrained models by applying hyperplane reflections on the pretrained weights. Both these metohds show extreme parameter-efficiency (~10-100 times fewer parameters than OFT or LoRA) while demostrating high robustness to learning rate and hyperparameter choices.

  • ETHER is the fastest and most parameter-efficient (one vector per finetuned layer)
  • ETHER+ is the best performant

Project Structure

This project is split in two separate directories, each of which contains the code and instructions to reproduce the results in the paper:

./ether-instruct: code for finetuning Llama2-7B on Instruction Tuning. This is built on top of the litgpt repository, allowing for easy finetuning over a multitude of language models and datasets.

./ether-control: code for finetuning Stable Diffusion on ControlNet-like tasks (s.a. Semantic Map to Image). [Work in Progress]

Best Practices

Choosing the number of diagonal blocks n:

TLDR: using block-diagonal transformations allows for computational reduction and parallelization in multiplicative finetuning, significantly speeding up training. In ETHER/ETHER+ increasing n does not impact the number of training parameters, causing a marginal impact on performance.

  • increasing n leads to significant speed-up on models with large hidden dimension (e.g. for 4096 of LLama2-7B, compared to 2048 of Phi1.5-1.3B)
  • surprisingly, on Llama2-7B Alpaca finetuning, n=32 provides overall best results for ETHER/ETHER+
ft method Phi1.5-1.3B Llama2-7B
TFLOPs rel. drop TFLOPs rel. drop
ETHERn=1 9.13 - 25.26 -
ETHERn=4 7.07 -23% 12.07 -52%
ETHERn=32 6.71 -27% 8.22 -68%
ETHER+n=1 10.78 - 51.65 -
ETHER+n=4 7.69 -29% 18.66 -64%
ETHER+n=32 6.79 -37% 9.04 -83%
(LoRAr=8) (6.04) - (6.85) -

Visualizations

Qualitative evidence of learning rate robustness (on Subject-driven Generation with Stable Diffusion):

Acknowledgments

This code repository is based on the implementation of litgpt

Citation

If you find ETHER useful, please consider citing our work:

@misc{bini2024ether,
      title={ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections}, 
      author={Massimo Bini and Karsten Roth and Zeynep Akata and Anna Khoreva},
      year={2024},
      eprint={2405.20271},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

License

This code repository is open-sourced under MIT license.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages