How to decrease inference time? #245

rahulvigneswaran · 2023-04-06T11:10:10Z

It takes around 40 mins to generate 1000 sentences describing something on a single V100 (32GB). How can I decrease this and increase the speed?

oldsj · 2023-04-06T15:32:27Z

Checkout llama.cpp ggerganov/llama.cpp#771

Shiro836 · 2023-04-06T17:19:03Z

Checkout llama.cpp ggerganov/llama.cpp#771

Isn't that repo cpu-only? @rahulvigneswaran asked about gpu. I am curious too. It's only 3 tokens per second on my 4090 in 8bit mode.

rahulvigneswaran · 2023-04-07T12:03:22Z

@oldsj Yeah, like @Shiro836 said, I wanna run on GPUs.

rahulvigneswaran · 2023-04-07T12:04:19Z

Checkout llama.cpp ggerganov/llama.cpp#771

Isn't that repo cpu-only? @rahulvigneswaran asked about gpu. I am curious too. It's only 3 tokens per second on my 4090 in 8bit mode.

Also, how can I run on 8bit mode? Is it how the model using the general instruction provided on readme runs?

zhisbug · 2023-04-07T21:04:06Z

If you do not additional compute, I'd say you might want to use compressed/quantized vicuna (we provided an official 8-bit version yesterday). It will give you slightly higher throughput or lower latency but at comprised quality.

We're working on pushing some of our system technologies out to optimize the inference and throughput speed. But it will take a while to land.

zhisbug closed this as completed Apr 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to decrease inference time? #245

How to decrease inference time? #245

rahulvigneswaran commented Apr 6, 2023

oldsj commented Apr 6, 2023

Shiro836 commented Apr 6, 2023 •

edited

Loading

rahulvigneswaran commented Apr 7, 2023

rahulvigneswaran commented Apr 7, 2023

zhisbug commented Apr 7, 2023

How to decrease inference time? #245

How to decrease inference time? #245

Comments

rahulvigneswaran commented Apr 6, 2023

oldsj commented Apr 6, 2023

Shiro836 commented Apr 6, 2023 • edited Loading

rahulvigneswaran commented Apr 7, 2023

rahulvigneswaran commented Apr 7, 2023

zhisbug commented Apr 7, 2023

Shiro836 commented Apr 6, 2023 •

edited

Loading