Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge HF LoRa adapter with a quantized GPT-J model using ggml #534

Open
webpolis opened this issue Sep 23, 2023 · 0 comments
Open

Merge HF LoRa adapter with a quantized GPT-J model using ggml #534

webpolis opened this issue Sep 23, 2023 · 0 comments

Comments

@webpolis
Copy link

webpolis commented Sep 23, 2023

Hello!

I have fine-tuned a GPT-J base model (loaded in 4 bits) using HF + LoRa. I quantized the same base model using ggml to q4_0, and it loads perfectly fine using the built examples/gpt-j binaries. Since it's not yet possible to save 4-bit models together with the adapters using HF's 4-bit loaded model, I must find a different way to accomplish this.

I want to "merge" the LoRa adapters (convert to ggml first?) with this q4_0 version so I can perform inference on the CPU.

Any hints?

@webpolis webpolis reopened this Sep 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant