Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Can LLava inference on CPU? #865

Open
wenli135 opened this issue Nov 27, 2023 · 5 comments
Open

[Question] Can LLava inference on CPU? #865

wenli135 opened this issue Nov 27, 2023 · 5 comments

Comments

@wenli135
Copy link

Question

I was trying to run LLava inference on cpu, but it complains "Torch not compiled with CUDA enabled". I noticed that cuda() is called when loading model. If I remove all the cuda() invocation, is it possible to run inference on cpu?

thanks.

@papasanimohansrinivas
Copy link

you need to install torch cpu and set device map to cpu in model loading side @wenli135

@morteza102030
Copy link

you need to install torch cpu and set device map to cpu in model loading side @wenli135

it's possible for you give a complete example for how run LLaVA_13b_4bit_vanilla_colab without gpu?

@akkimind
Copy link

akkimind commented Dec 1, 2023

I made some changes in the code to run inference on CPU, the model is loading but I am getting an error:
BF16 weight prepack needs the cpu support avx512bw, avx512vl and avx512dq, please set dtype to torch.float or set weights_prepack to False
while trying to optimize the model(model = ipex.optimize(model, dtype=torch.bfloat16))
If I set dtype to torch.float, model isn’t supporting it and if set weights_prepack to False, model is taking forever to return response. Is there any Specific CPU which I should use?

@ratan
Copy link

ratan commented Jan 9, 2024

did anyone able to run Llava inference on CPU without installing Intel Extention for Pytorch environment for inference? Any pointer will be really helpful

@feng-intel
Copy link
Contributor

Hi Ratan
Here is the bare metal intel cpu solution intel xFasterTransformer for LLM, but there is no llava support yet. You can try firstly.
llama.cpp also support CPU. We will enable intel dGPU/iGPU later.

Could you tell why you don't want to use Intel Extention for Pytorch? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants