ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections (ICML2024)

Official PyTorch implementation of ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections, M. Bini, K. Roth, Z. Akata, A. Khoreva (ICML 2024)

Paper | Contact | Cite

TLDR: ETHER and its relaxation ETHER+ finetune pretrained models by applying hyperplane reflections on the pretrained weights. Both these metohds show extreme parameter-efficiency (~10-100 times fewer parameters than OFT or LoRA) while demostrating high robustness to learning rate and hyperparameter choices.

ETHER is the fastest and most parameter-efficient (one vector per finetuned layer)
ETHER+ is the best performant

Project Structure

This project is split in two separate directories, each of which contains the code and instructions to reproduce the results in the paper:

./ether-instruct: code for finetuning Llama2-7B on Instruction Tuning. This is built on top of the litgpt repository, allowing for easy finetuning over a multitude of language models and datasets.

./ether-control: code for finetuning Stable Diffusion on ControlNet-like tasks (s.a. Semantic Map to Image). [Work in Progress]

Best Practices

Choosing the number of diagonal blocks n:

TLDR: using block-diagonal transformations allows for computational reduction and parallelization in multiplicative finetuning, significantly speeding up training. In ETHER/ETHER+ increasing n does not impact the number of training parameters, causing a marginal impact on performance.

increasing n leads to significant speed-up on models with large hidden dimension (e.g. for 4096 of LLama2-7B, compared to 2048 of Phi1.5-1.3B)
surprisingly, on Llama2-7B Alpaca finetuning, n=32 provides overall best results for ETHER/ETHER+

ft method	Phi1.5-1.3B		Llama2-7B
	TFLOPs	rel. drop	TFLOPs	rel. drop
ETHER_n=1	9.13	-	25.26	-
ETHER_n=4	7.07	-23%	12.07	-52%
ETHER_n=32	6.71	-27%	8.22	-68%
ETHER+_n=1	10.78	-	51.65	-
ETHER+_n=4	7.69	-29%	18.66	-64%
ETHER+_n=32	6.79	-37%	9.04	-83%
(LoRA_r=8)	(6.04)	-	(6.85)	-

Visualizations

Qualitative evidence of learning rate robustness (on Subject-driven Generation with Stable Diffusion):

Acknowledgments

This code repository is based on the implementation of litgpt

Citation

If you find ETHER useful, please consider citing our work:

@misc{bini2024ether,
      title={ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections}, 
      author={Massimo Bini and Karsten Roth and Zeynep Akata and Anna Khoreva},
      year={2024},
      eprint={2405.20271},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

License

This code repository is open-sourced under MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
ether-control		ether-control
ether-instruct		ether-instruct
imgs		imgs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections (ICML2024)

Project Structure

Best Practices

Visualizations

Acknowledgments

Citation

License

About

Releases

Packages

Languages

License

mwbini/ether

Folders and files

Latest commit

History

Repository files navigation

ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections (ICML2024)

Project Structure

Best Practices

Visualizations

Acknowledgments

Citation

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages