Merge HF LoRa adapter with a quantized GPT-J model using ggml #534

webpolis · 2023-09-23T03:20:20Z

Hello!

I have fine-tuned a GPT-J base model (loaded in 4 bits) using HF + LoRa. I quantized the same base model using ggml to q4_0, and it loads perfectly fine using the built examples/gpt-j binaries. Since it's not yet possible to save 4-bit models together with the adapters using HF's 4-bit loaded model, I must find a different way to accomplish this.

I want to "merge" the LoRa adapters (convert to ggml first?) with this q4_0 version so I can perform inference on the CPU.

Any hints?

webpolis closed this as completed Sep 25, 2023

webpolis reopened this Sep 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge HF LoRa adapter with a quantized GPT-J model using ggml #534

Merge HF LoRa adapter with a quantized GPT-J model using ggml #534

webpolis commented Sep 23, 2023 •

edited

Loading

Merge HF LoRa adapter with a quantized GPT-J model using ggml #534

Merge HF LoRa adapter with a quantized GPT-J model using ggml #534

Comments

webpolis commented Sep 23, 2023 • edited Loading

webpolis commented Sep 23, 2023 •

edited

Loading