Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to run the finetuned model with LoRA adapters. #59

Open
thisurawz1 opened this issue Jul 23, 2024 · 6 comments
Open

How to run the finetuned model with LoRA adapters. #59

thisurawz1 opened this issue Jul 23, 2024 · 6 comments

Comments

@thisurawz1
Copy link

i have successfully fine-tuned the model using QLORA for a custom use case. now i have the LoRA adapters and can you tell how to use it for the inference. maybe merge lora weights with the original model and do the inference.

@Yogesh914
Copy link

Hi @thisurawz1, I was wondering if you were available for a call or text, we are currently experiencing some issues when fine tuning with finetune_lora.sh file, and was wondering if we could use your guidance.

I have a discord as well if you prefer, let me know what works best for you

@thisurawz1
Copy link
Author

thisurawz1 commented Jul 24, 2024

You can contact me on Discord - "wick6309". However, I'm not very active on Discord and mainly use WeChat. Anyway, I've posted below all the issues I encountered and their solutions for everyone's reference.

I mainly used the QLoRA script and did a fine-tuning as a trial run. My dataset was quite small, around 229 samples (image and text). I encountered the following issues while doing the fine-tuning. I used 1 A100 40GB GPU, but the VRAM was not enough to run the QLoRA script with a batch size of 4, so I had to change it to 2.

1 Adjust the number of GPUs available in your PC

  • Error: RuntimeError: CUDA error: invalid device ordinal
  • Solution: Change the number of devices in the script to the available devices on your machine.
  • code
    import torch print(torch.cuda.device_count()) # to see the number of GPUs available in your device ARG_WORLD_SIZE=${1:-1} ARG_NPROC_PER_NODE=${2:-1} # Adjust based on available GPUs in the LoRA or QLoRA script (in my case, it's just 1 GPU)

2 Hugging Face offline mode error

  • code
    export TRANSFORMERS_OFFLINE=0 # Temporarily disable offline mode in the script

3 Cannot access "mistralai/Mistral-7B-Instruct-v0.2" as it is a private repo

  • Error: Cannot access gated repo for url https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/resolve/main/config.json. Access to model mistralai/Mistral-7B-Instruct-v0.2 is restricted. You must be authenticated to access it.
  • Solution: First, go to this repo and apply for access, then copy your Hugging Face read token.
  • code
    from huggingface_hub import login login(token="your_huggingface_token") # Add this to the script file

4 mm_projector.bin couldn't be found

  • Error: FileNotFoundError: [Errno 2] No such file or directory: '/root/.cache/huggingface/hub/models--DAMO-NLP-SG--VideoLLaMA2-7B-Base/snapshots/main/mm_projector.bin'
  • Solution: Make sure you download mm_projector.bin to the correct path.
  • code
    python -c " from huggingface_hub import hf_hub_download hf_hub_download(repo_id='DAMO-NLP-SG/VideoLLaMA2-7B-Base', filename='mm_projector.bin') "

5 change the dataset path and folder.

  • solution: use the repo guide to make the dataset structure
  • code
    --data_path datasets/custom_sft/custom.json \ # path --data_folder datasets/custom_sft/ #folder

6 NCCL error/ CUDA error/ Not enough VRAM

  • Error: torch.distributed.DistBackendError: NCCL error in: /opt/conda/conda-bld/pytorch_1704987394225/work/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1691, unhandled cuda error (run with NCCL_DEBUG=INFO for details), NCCL version 2.19.3
    ncclUnhandledCudaError: Call to CUDA function failed.
    Last error:
    Failed to CUDA calloc async 24 bytes
  • Solution: This is mainly because you don't have enough VRAM. You can reduce the batch size. For QLoRA fine-tuning, at least 40GB of VRAM is recommended. If you have more VRAM, you can increase the batch size.
  • code
    # Training Arguments GLOBAL_BATCH_SIZE=128 # Reduce from 128 to 64 or you can keep 128 LOCAL_BATCH_SIZE=2 # Change these two in the script from 4 to 2

Hi @thisurawz1, I was wondering if you were available for a call or text, we are currently experiencing some issues when fine tuning with finetune_lora.sh file, and was wondering if we could use your guidance.

I have a discord as well if you prefer, let me know what works best for you

@Yogesh914
Copy link

Yogesh914 commented Jul 24, 2024

Hey @thisurawz1 thanks a lot for the reply, it made things clear, I am working with @lucasxu777 on this so if you could add him that would be great since he has wechat! I have also added you on discord as well ".yogiii" is my username.

@Yogesh914
Copy link

i have successfully fine-tuned the model using QLORA for a custom use case. now i have the LoRA adapters and can you tell how to use it for the inference. maybe merge lora weights with the original model and do the inference.

It was solved here: #32

@thisurawz1
Copy link
Author

Hey @thisurawz1, thanks for sharing the information here!!! I wonder if I can add you on WeChat so that we can make the conversations easier maybe for future work :)). My WeChat account is: kjw4LV

noted. ill add you

@thisurawz1
Copy link
Author

i have successfully fine-tuned the model using QLORA for a custom use case. now i have the LoRA adapters and can you tell how to use it for the inference. maybe merge lora weights with the original model and do the inference.

It was solved here: #32

Thanks. ill add your friend. is there any proper guide on how to do the inference with the lora fine tuned model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants