Subcloning

Subcloning as referred in the paper https://arxiv.org/pdf/2312.09299, is an initialization technique that allows for the transfer of weights from a larger model to a smaller one. This technique is useful when you have a large model that you want to transfer to a smaller model, to achieve faster convergence and better performance.

In my project, I subcloned a llama8b model into a 370m parameter model. I utilized the fineweb datasets to train the subcloned model.

Unfortunately, I don't have the time to research and extend this subcloning technique to general transformers at the moment. However, I have decided to release the subcloning method as it is, i'll release a gpt-2 variant that was done by keller.

Below is an image showing the convergence of training-loss with and without subcloned weights:

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
subcloning_llama8b_to_300m.png		subcloning_llama8b_to_300m.png
subweight-cloning.ipynb		subweight-cloning.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Subcloning

About

Releases

Packages

Languages

SonicCodes/subcloning

Folders and files

Latest commit

History

Repository files navigation

Subcloning

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages