Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Just use huggingface #6

Open
dustydecapod opened this issue Mar 8, 2023 · 7 comments
Open

Just use huggingface #6

dustydecapod opened this issue Mar 8, 2023 · 7 comments

Comments

@dustydecapod
Copy link

All of the models are on huggingface already. https://huggingface.co/decapoda-research

there's even an open, working pr to add support to the transformers lib.

@shawwn
Copy link
Owner

shawwn commented Mar 8, 2023

Sure, use whatever works. This repo is intended to serve as a point of communication about llama, and also as an extra mirror.

Note that Facebook has been issuing takedown requests against huggingface llama repositories, so those may get knocked offline.

@loretoparisi
Copy link

All of the models are on huggingface already. https://huggingface.co/decapoda-research

there's even an open, working pr to add support to the transformers lib.

It's worth to note that those models files have been converted to be used in the HF library, so if we take the 7B models files here

According the authors the model has been infact

LLaMA-7B converted to work with Transformers/HuggingFace. This is under a special license, please see the LICENSE file for details.

So supposed we want to use model's file in C++ inference here I'm not sure if would work.

@tljstewart
Copy link

@loretoparisi Ya i'm thinking along the same lines and trying to make sense here. There are 8bit and 4bit quantized, the original and the huggingface versions... I think C++ inference use the original weights and converts them, to ggml format the authors own format and also does the quantization...?

Can this be confirmed?

Also, I am currently using ipfs downloading current time is 2d9h42m for 65B.... as the magnet link in this repo seems to be down as well as huggingface...

Any thoughts on the model formats with C++ or a way to download the weights faster?

@loretoparisi
Copy link

yes confirmed. You first convert weights to ggml FP16 or FP32, then quantize to 4bit and run inference (cpu only).

@tljstewart
Copy link

tljstewart commented Mar 13, 2023

Ah ok, so your suppose to get the original released weights and the C++ code converts it? Also I found an original weight torrent link and its going extremely fast, ETA 3hour for 235GB.

webtorrent download o8a7xw.torrent

@loretoparisi
Copy link

loretoparisi commented Mar 13, 2023

yes this is exactly what I did from the download here.

@risos8200
Copy link

You can also use https://huggingface.co/huggyllama, works with llama.cpp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants