Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Have you finished LoRA/QLoRA training? #206

Open
Iceland-Leo opened this issue Jun 1, 2023 · 12 comments
Open

[Question] Have you finished LoRA/QLoRA training? #206

Iceland-Leo opened this issue Jun 1, 2023 · 12 comments

Comments

@Iceland-Leo
Copy link

Question

Nice work!
I just wonder when could you finish LoRA/QLoRA training? It may help a lot to finetune.

@Iceland-Leo
Copy link
Author

Does anyone complete the lora/qlora training on llava?

@haotian-liu
Copy link
Owner

HI, thank you for your interest in our work. We are implementing the LoRA/QLoRA training and we are now verifying the correctness with training some model checkpoints. Will update soon if the performance is verified. Thanks!

@XipengY
Copy link

XipengY commented Jun 7, 2023

@haotian-liu Looking forward to the implementation of LoRA/QLoRA training, could you put the code in the develop branch, and we can try to develop and verify it together.

@haotian-liu
Copy link
Owner

Hi @XipengY @Iceland-Leo

LoRA support (preview) and an initial checkpoint I trained is released here. Please let me know if there is something unclear in the instruction.

Also, if any of you are interested in contributing to the hyperparameter search, please let me know.

Note that QLoRA support is partially finished, but distributed training is not supported yet.

@devrituraj
Copy link

Hi...

I find that LoRA Fine-tuning is slower (almost half - 5.72 s/it) as compared to full training (2.77 s/it). Same LR - 2e-5 and Batch Size -4, Grad-accum- 1 was considered for both the setups. Script was run using torchrun with deepspeed config(zero3.json) as an argument. Are there any similar observations? Intuitively, LoRA should be faster!

btw, wonderful work!

@haotian-liu
Copy link
Owner

Hi @devrituraj

Thank you for trying this out and for providing the feedback. Which GPUs (and how many) are you using? And can you provide your commands? And do you notice a reduction in the GPU memory consumption?

For smaller models I noticed a smaller performance benefit. It would be better to know more about the specific configurations on your side. Thanks!

@aprilehannibal
Copy link

aprilehannibal commented Jun 15, 2023

@haotian-liu How many hours do you spend when finetune 13B with lora?

@XipengY
Copy link

XipengY commented Jun 16, 2023

@haotian-liu ,It's great work support deepspeed and lora, significantly reduces GPU memory, but I changed tf32 and bf16 to False, it also fails to train, do you have the same problem?

@YerongLi
Copy link

YerongLi commented Jul 21, 2023

Does anyone complete the lora/qlora training on llava?

I think lora works now, but quantization does not work for me, does anyone complete the QLora training?

quantization reports this error:

  File "/scratch/yerong/LLaVA/llava/train/train.py", line 657, in train                                                                              [276/971]
    model = LlavaLlamaForCausalLM.from_pretrained(                                                                                                            
  File "/scratch/yerong/.conda/envs/llava/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2694, in from_pretrained                         
    dispatch_model(model, device_map=device_map, offload_dir=offload_folder, offload_index=offload_index)                                                     
  File "/scratch/yerong/.conda/envs/llava/lib/python3.10/site-packages/accelerate/big_modeling.py", line 371, in dispatch_model                               
    attach_align_device_hook_on_blocks(                                                                                                                       
  File "/scratch/yerong/.conda/envs/llava/lib/python3.10/site-packages/accelerate/hooks.py", line 506, in attach_align_device_hook_on_blocks                  
    add_hook_to_module(module, hook)                                                                                                                          
  File "/scratch/yerong/.conda/envs/llava/lib/python3.10/site-packages/accelerate/hooks.py", line 155, in add_hook_to_module                                  
    module = hook.init_hook(module)                                                                                                                           
  File "/scratch/yerong/.conda/envs/llava/lib/python3.10/site-packages/accelerate/hooks.py", line 253, in init_hook                                           
    set_module_tensor_to_device(module, name, self.execution_device)                                                                                          
  File "/scratch/yerong/.conda/envs/llava/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 267, in set_module_tensor_to_device                
    raise ValueError(f"{tensor_name} is on the meta device, we need a `value` to put in on {device}.")                                                        
ValueError: weight is on the meta device, we need a `value` to put in on cuda:0.   

@haotian-liu
Copy link
Owner

@aprilehannibal I do not see too much of a difference between LoRA and full-finetuning, both around 1.5-2 hours on finetune lightning.

@haotian-liu
Copy link
Owner

@YerongLi

I have updated the QLora support. Please pull the latest repo and reinstall the packages using pip install -e .

See example script here.

@lilia147852
Copy link

lilia147852 commented Jun 18, 2024

@haotian-liu
excuse me can i implement QLoRA for LLAVA v1.5 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants