Official PyTorch implementation of ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections, M. Bini, K. Roth, Z. Akata, A. Khoreva (ICML 2024)
TLDR: ETHER and its relaxation ETHER+ finetune pretrained models by applying hyperplane reflections on the pretrained weights. Both these metohds show extreme parameter-efficiency (~10-100 times fewer parameters than OFT or LoRA) while demostrating high robustness to learning rate and hyperparameter choices.
- ETHER is the fastest and most parameter-efficient (one vector per finetuned layer)
- ETHER+ is the best performant
This project is split in two separate directories, each of which contains the code and instructions to reproduce the results in the paper:
./ether-instruct
: code for finetuning Llama2-7B on Instruction Tuning. This is built on top of the litgpt repository, allowing for easy finetuning over a multitude of language models and datasets.
./ether-control
: code for finetuning Stable Diffusion on ControlNet-like tasks (s.a. Semantic Map to Image). [Work in Progress]
Choosing the number of diagonal blocks n:
TLDR: using block-diagonal transformations allows for computational reduction and parallelization in multiplicative finetuning, significantly speeding up training. In ETHER/ETHER+ increasing n does not impact the number of training parameters, causing a marginal impact on performance.
- increasing n leads to significant speed-up on models with large hidden dimension (e.g. for 4096 of LLama2-7B, compared to 2048 of Phi1.5-1.3B)
- surprisingly, on Llama2-7B Alpaca finetuning, n=32 provides overall best results for ETHER/ETHER+
ft method | Phi1.5-1.3B | Llama2-7B | ||
---|---|---|---|---|
TFLOPs | rel. drop | TFLOPs | rel. drop | |
ETHERn=1 | 9.13 | - | 25.26 | - |
ETHERn=4 | 7.07 | -23% | 12.07 | -52% |
ETHERn=32 | 6.71 | -27% | 8.22 | -68% |
ETHER+n=1 | 10.78 | - | 51.65 | - |
ETHER+n=4 | 7.69 | -29% | 18.66 | -64% |
ETHER+n=32 | 6.79 | -37% | 9.04 | -83% |
(LoRAr=8) | (6.04) | - | (6.85) | - |
Qualitative evidence of learning rate robustness (on Subject-driven Generation with Stable Diffusion):
This code repository is based on the implementation of litgpt
If you find ETHER useful, please consider citing our work:
@misc{bini2024ether,
title={ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections},
author={Massimo Bini and Karsten Roth and Zeynep Akata and Anna Khoreva},
year={2024},
eprint={2405.20271},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
This code repository is open-sourced under MIT license.