You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have fine-tuned a GPT-J base model (loaded in 4 bits) using HF + LoRa. I quantized the same base model using ggml to q4_0, and it loads perfectly fine using the built examples/gpt-j binaries. Since it's not yet possible to save 4-bit models together with the adapters using HF's 4-bit loaded model, I must find a different way to accomplish this.
I want to "merge" the LoRa adapters (convert to ggml first?) with this q4_0 version so I can perform inference on the CPU.
Any hints?
The text was updated successfully, but these errors were encountered:
Hello!
I have fine-tuned a GPT-J base model (loaded in 4 bits) using HF + LoRa. I quantized the same base model using ggml to q4_0, and it loads perfectly fine using the built examples/gpt-j binaries. Since it's not yet possible to save 4-bit models together with the adapters using HF's 4-bit loaded model, I must find a different way to accomplish this.
I want to "merge" the LoRa adapters (convert to ggml first?) with this q4_0 version so I can perform inference on the CPU.
Any hints?
The text was updated successfully, but these errors were encountered: