-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Train with llava-llama3 #8
Comments
Hi @hellangleZ, Thank you for your interest in our work. Please make sure that you have followed the below steps correctly for running the training, STEP 1: Ensure to install all dependencies accurately. Follow the instructions below for installation,
STEP 2: Ensure that you have correct transformers version. Please install transformers using the following command.
STEP 3: Ensure that you copied all the relevant files to the LLaVA directory, For LLaMA-3, do the following,
For Phi-3, do the following,
STEP 4: Make sure you are using STEP 5: Make sure to use meta-llama/Meta-Llama-3-8B-Instruct as base model for LLaMA-3 based trainings. And microsoft/Phi-3-mini-4k-instruct as base model for Phi-3 based trainings. I hope this will solve the issue. In case, if it did not solve the issue, please provide the step-by-step instructions to reproduce the issue so that we can reproduce the issue and assist you better. Good Luck :) |
Hi @mmaaz60 It should be at LLaVA folder or just LLaVA-pp folder ? |
Hi @hellangleZ It should be in the LLaVA-pp/LLaVA folder. |
HI @mmaaz60 I copy all the step by step but still has some bug
|
Hi @mmaaz60 Also same issue occurs on LLama3 pretrain
|
Hello @mmaaz60 , I have been following the installation process you provided exactly, with the exception of the version of
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llava 1.2.2.post1 requires tokenizers==0.15.1, but you have tokenizers 0.19.1 which is incompatible.
llava 1.2.2.post1 requires transformers==4.37.2, but you have transformers 4.41.0.dev0 which is incompatible.
Could you please help me diagnose and resolve these issues? Here's my current environment setup:
Thank you for your help! |
STEP2 STEP3
Great, It's work now. it's a deepspeed issue |
Hi @Luo-Z13,
Please make sure that baseline official LLaVA code is working properly. And then make sure to copy all the files related to LLaMA-3 in the corresponding directory. Lastly please note that to run LLaMA-3 based training you need to pass I hope it will help and solve the issue. Good Luck. |
Hi @hellangleZ @Luo-Z13, I am closing this issue as @hellangleZ was able to run the trainings. Please feel free to create a new issue in case if you have any questions or encounter any other error. I appreciate your cooperation. Thank You. |
fix typo in clean_sharegpt.py
After start pretrain, there is a bug
Traceback (most recent call last):
File "/data2/LLaVA-pp/LLaVA/llava/train/train_mem.py", line 4, in
train(attn_implementation="flash_attention_2")
File "/data2/LLaVA-main/llava/train/train.py", line 969, in train
trainer.train()
File "/data22/llava/lib/python3.10/site-packages/transformers/trainer.py", line 1876, in train
return inner_training_loop(
File "/data22/llava/lib/python3.10/site-packages/transformers/trainer.py", line 2187, in _inner_training_loop
for step, inputs in enumerate(epoch_iterator):
File "/data22/llava/lib/python3.10/site-packages/accelerate/data_loader.py", line 452, in iter
current_batch = next(dataloader_iter)
File "/data22/llava/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 630, in next
data = self._next_data()
File "/data22/llava/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1345, in _next_data
return self._process_data(data)
File "/data22/llava/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
data.reraise()
File "/data22/llava/lib/python3.10/site-packages/torch/_utils.py", line 694, in reraise
raise exception
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/data22/llava/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/data22/llava/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
return self.collate_fn(data)
File "/data2/LLaVA-main/llava/train/train.py", line 751, in call
input_ids = torch.nn.utils.rnn.pad_sequence(
File "/data22/llava/lib/python3.10/site-packages/torch/nn/utils/rnn.py", line 400, in pad_sequence
return torch._C._nn.pad_sequence(sequences, batch_first, padding_value)
TypeError: pad_sequence(): argument 'padding_value' (position 3) must be float, not NoneType
The text was updated successfully, but these errors were encountered: