Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Expected is_sm80 || is_sm90 to be true, but got false. #161

Closed
RakshitAralimatti opened this issue Jun 21, 2024 · 4 comments
Closed

Comments

@RakshitAralimatti
Copy link

Getting this error when trying to quantize llama3-8b-instruct in an T4 GPU.
torch version - 2.1.1

@wenhuach21
Copy link
Contributor

Thank you for trying AutoRound. Could you kindly attach more log?

@RakshitAralimatti
Copy link
Author

RakshitAralimatti commented Jun 21, 2024

thanks for your response.

Code i am running -
`from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "meta-llama/Meta-Llama-3-8B-Instruct"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

from auto_round import AutoRound

bits, group_size, sym = 4, 128, False

autoround = AutoRound(model, tokenizer, bits=bits, group_size=group_size, sym=sym, device=None)
autoround.quantize()
output_dir = "4bit_autoRound"
autoround.save_quantized(output_dir) `

Error I am getting -
Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00, 1.13it/s] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 2024-06-21 05:22:46 INFO utils.py L527: Using GPU device 2024-06-21 05:22:46 INFO autoround.py L464: using torch.float16 for quantization tuning 2024-06-21 05:22:52 INFO autoround.py L851: switch to cpu to cache inputs 2024-06-21 05:23:09 INFO autoround.py L1306: quantizing 1/32, model.layers.0 Traceback (most recent call last): File "/home/sandlogic/LINGO/LINGO_PROJECTS/ModelQuantization/Intel_AutoRound/llama3_4bit_Autoround.py", line 12, in <module> autoround.quantize() File "/home/sandlogic/LINGO/LINGO_ENV/ModelQuantization/lib/python3.10/site-packages/auto_round/autoround.py", line 575, in quantize self.quant_blocks( File "/home/sandlogic/LINGO/LINGO_ENV/ModelQuantization/lib/python3.10/site-packages/auto_round/autoround.py", line 1316, in quant_blocks q_input, input_ids = self.quant_block( File "/home/sandlogic/LINGO/LINGO_ENV/ModelQuantization/lib/python3.10/site-packages/auto_round/autoround.py", line 1208, in quant_block self.scale_loss_and_backward(scaler, loss) File "/home/sandlogic/LINGO/LINGO_ENV/ModelQuantization/lib/python3.10/site-packages/auto_round/autoround.py", line 1470, in scale_loss_and_backward scale_loss.backward() File "/home/sandlogic/LINGO/LINGO_ENV/ModelQuantization/lib/python3.10/site-packages/torch/_tensor.py", line 492, in backward torch.autograd.backward( File "/home/sandlogic/LINGO/LINGO_ENV/ModelQuantization/lib/python3.10/site-packages/torch/autograd/__init__.py", line 251, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: Expected is_sm80 || is_sm90 to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)

Cuda version - 11.8
torch version - 2.1.1
python version - 3.10

@wenhuach21
Copy link
Contributor

wenhuach21 commented Jun 21, 2024

Thanks for the information. AutoRound requires gradient backward, however, the root cause should be related to torch or sdpa-attention. please refer to pytorch/pytorch#98140 or try other solutions online.

@wenhuach21
Copy link
Contributor

or huggingface/accelerate#2799

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants