Train with llava-llama3 #8

hellangleZ · 2024-04-30T07:58:52Z

After start pretrain， there is a bug

Traceback (most recent call last):
File "/data2/LLaVA-pp/LLaVA/llava/train/train_mem.py", line 4, in
train(attn_implementation="flash_attention_2")
File "/data2/LLaVA-main/llava/train/train.py", line 969, in train
trainer.train()
File "/data22/llava/lib/python3.10/site-packages/transformers/trainer.py", line 1876, in train
return inner_training_loop(
File "/data22/llava/lib/python3.10/site-packages/transformers/trainer.py", line 2187, in _inner_training_loop
for step, inputs in enumerate(epoch_iterator):
File "/data22/llava/lib/python3.10/site-packages/accelerate/data_loader.py", line 452, in iter
current_batch = next(dataloader_iter)
File "/data22/llava/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 630, in next
data = self._next_data()
File "/data22/llava/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1345, in _next_data
return self._process_data(data)
File "/data22/llava/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
data.reraise()
File "/data22/llava/lib/python3.10/site-packages/torch/_utils.py", line 694, in reraise
raise exception
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/data22/llava/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/data22/llava/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
return self.collate_fn(data)
File "/data2/LLaVA-main/llava/train/train.py", line 751, in call
input_ids = torch.nn.utils.rnn.pad_sequence(
File "/data22/llava/lib/python3.10/site-packages/torch/nn/utils/rnn.py", line 400, in pad_sequence
return torch._C._nn.pad_sequence(sequences, batch_first, padding_value)
TypeError: pad_sequence(): argument 'padding_value' (position 3) must be float, not NoneType

mmaaz60 · 2024-04-30T16:23:50Z

Hi @hellangleZ,

Thank you for your interest in our work. Please make sure that you have followed the below steps correctly for running the training,

STEP 1: Ensure to install all dependencies accurately. Follow the instructions below for installation,

git clone https://github.com/mbzuai-oryx/LLaVA-pp.git
cd LLaVA-pp
git submodule update --init --recursive

pip install --upgrade pip
pip install -e .

pip install git+https://github.com/huggingface/transformers@a98c41798cf6ed99e1ff17e3792d6e06a2ff2ff3

pip install ninja
pip install flash-attn --no-build-isolation --no-cache-dir

STEP 2: Ensure that you have correct transformers version. Please install transformers using the following command.

pip install git+https://github.com/huggingface/transformers@a98c41798cf6ed99e1ff17e3792d6e06a2ff2ff3

STEP 3: Ensure that you copied all the relevant files to the LLaVA directory,

For LLaMA-3, do the following,

cp LLaMA-3-V/train.py LLaVA/llava/train/train.py
cp LLaMA-3-V/conversation.py LLaVA/llava/conversation.py
cp LLaMA-3-V/builder.py LLaVA/llava/model/builder.py
cp LLaMA-3-V/llava_llama.py LLaVA/llava/model/language_model/llava_llama.py

For Phi-3, do the following,

cp Phi-3-V/train.py LLaVA/llava/train/train.py
cp Phi-3-V/llava_phi3.py LLaVA/llava/model/language_model/llava_phi3.py
cp Phi-3-V/builder.py LLaVA/llava/model/builder.py
cp Phi-3-V/model__init__.py LLaVA/llava/model/__init__.py
cp Phi-3-V/main__init__.py LLaVA/llava/__init__.py
cp Phi-3-V/conversation.py LLaVA/llava/conversation.py

STEP 4: Make sure you are using --version plain for pretraining, --version llama3 for LLaMA-3 based fine-tuning and --version phi3_instruct for Phi-3 based fine-tuning.

STEP 5: Make sure to use meta-llama/Meta-Llama-3-8B-Instruct as base model for LLaMA-3 based trainings. And microsoft/Phi-3-mini-4k-instruct as base model for Phi-3 based trainings.

I hope this will solve the issue. In case, if it did not solve the issue, please provide the step-by-step instructions to reproduce the issue so that we can reproduce the issue and assist you better.

Good Luck :)

hellangleZ · 2024-05-01T02:12:37Z

Hi @hellangleZ,

Thank you for your interest in our work. Please make sure that you have followed the below steps correctly for running the training,

STEP 1: Ensure to install all dependencies accurately. Follow the instructions below for installation,
git clone https://github.com/mbzuai-oryx/LLaVA-pp.git
cd LLaVA-pp
git submodule update --init --recursive

pip install --upgrade pip
pip install -e .

pip install git+https://github.com/huggingface/transformers@a98c41798cf6ed99e1ff17e3792d6e06a2ff2ff3

pip install ninja
pip install flash-attn --no-build-isolation --no-cache-dir
STEP 2: Ensure that you have correct transformers version. Please install transformers using the following command.
pip install git+https://github.com/huggingface/transformers@a98c41798cf6ed99e1ff17e3792d6e06a2ff2ff3
STEP 3: Ensure that you copied all the relevant files to the LLaVA directory,

For LLaMA-3, do the following,
cp LLaMA-3-V/train.py LLaVA/llava/train/train.py
cp LLaMA-3-V/conversation.py LLaVA/llava/conversation.py
cp LLaMA-3-V/builder.py LLaVA/llava/model/builder.py
cp LLaMA-3-V/llava_llama.py LLaVA/llava/model/language_model/llava_llama.py
For Phi-3, do the following,
cp Phi-3-V/train.py LLaVA/llava/train/train.py
cp Phi-3-V/llava_phi3.py LLaVA/llava/model/language_model/llava_phi3.py
cp Phi-3-V/builder.py LLaVA/llava/model/builder.py
cp Phi-3-V/model__init__.py LLaVA/llava/model/__init__.py
cp Phi-3-V/main__init__.py LLaVA/llava/__init__.py
cp Phi-3-V/conversation.py LLaVA/llava/conversation.py
STEP 4: Make sure you are using --version plain for pretraining, --version llama3 for LLaMA-3 based fine-tuning and --version phi3_instruct for Phi-3 based fine-tuning.

STEP 5: Make sure to use meta-llama/Meta-Llama-3-8B-Instruct as base model for LLaMA-3 based trainings. And microsoft/Phi-3-mini-4k-instruct as base model for Phi-3 based trainings.

I hope this will solve the issue. In case, if it did not solve the issue, please provide the step-by-step instructions to reproduce the issue so that we can reproduce the issue and assist you better.

Good Luck :)

Hi @mmaaz60

The step

It should be at LLaVA folder or just LLaVA-pp folder ?

mmaaz60 · 2024-05-01T02:19:25Z

Hi @hellangleZ

It should be in the LLaVA-pp/LLaVA folder.

hellangleZ · 2024-05-01T07:52:25Z

HI @mmaaz60

I copy all the step by step but still has some bug

Hi @hellangleZ
It should be in the LLaVA-pp/LLaVA folder.

step-4 and step5

but

I'm sure there is the code

hellangleZ · 2024-05-01T07:53:55Z

Hi @mmaaz60

Also same issue occurs on LLama3 pretrain

Hi @hellangleZ

It should be in the LLaVA-pp/LLaVA folder.

Luo-Z13 · 2024-05-01T12:15:06Z

Hi @hellangleZ,

Thank you for your interest in our work. Please make sure that you have followed the below steps correctly for running the training,

STEP 1: Ensure to install all dependencies accurately. Follow the instructions below for installation,
git clone https://github.com/mbzuai-oryx/LLaVA-pp.git
cd LLaVA-pp
git submodule update --init --recursive

pip install --upgrade pip
pip install -e .

pip install git+https://github.com/huggingface/transformers@a98c41798cf6ed99e1ff17e3792d6e06a2ff2ff3

pip install ninja
pip install flash-attn --no-build-isolation --no-cache-dir
STEP 2: Ensure that you have correct transformers version. Please install transformers using the following command.
pip install git+https://github.com/huggingface/transformers@a98c41798cf6ed99e1ff17e3792d6e06a2ff2ff3
STEP 3: Ensure that you copied all the relevant files to the LLaVA directory,

For LLaMA-3, do the following,
cp LLaMA-3-V/train.py LLaVA/llava/train/train.py
cp LLaMA-3-V/conversation.py LLaVA/llava/conversation.py
cp LLaMA-3-V/builder.py LLaVA/llava/model/builder.py
cp LLaMA-3-V/llava_llama.py LLaVA/llava/model/language_model/llava_llama.py
For Phi-3, do the following,
cp Phi-3-V/train.py LLaVA/llava/train/train.py
cp Phi-3-V/llava_phi3.py LLaVA/llava/model/language_model/llava_phi3.py
cp Phi-3-V/builder.py LLaVA/llava/model/builder.py
cp Phi-3-V/model__init__.py LLaVA/llava/model/__init__.py
cp Phi-3-V/main__init__.py LLaVA/llava/__init__.py
cp Phi-3-V/conversation.py LLaVA/llava/conversation.py
STEP 4: Make sure you are using --version plain for pretraining, --version llama3 for LLaMA-3 based fine-tuning and --version phi3_instruct for Phi-3 based fine-tuning.

STEP 5: Make sure to use meta-llama/Meta-Llama-3-8B-Instruct as base model for LLaMA-3 based trainings. And microsoft/Phi-3-mini-4k-instruct as base model for Phi-3 based trainings.

I hope this will solve the issue. In case, if it did not solve the issue, please provide the step-by-step instructions to reproduce the issue so that we can reproduce the issue and assist you better.

Good Luck :)

Hello @mmaaz60 ,

I have been following the installation process you provided exactly, with the exception of the version of accelerate (I am using accelerate==0.29.3). Here are the specific steps and issues I encountered:

After running pip install git+https://github.com/huggingface/transformers@a98c41798cf6ed99e1ff17e3792d6e06a2ff2ff3, I received the following errors:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llava 1.2.2.post1 requires tokenizers==0.15.1, but you have tokenizers 0.19.1 which is incompatible.
llava 1.2.2.post1 requires transformers==4.37.2, but you have transformers 4.41.0.dev0 which is incompatible.

Then, during the LoRA finetuning process, I encountered an error due to the default version of accelerator being too low, which resulted in the following error: TypeError: Accelerator.__init__() got an unexpected keyword argument 'use_seedable_sampler'. To resolve this, I ran pip install accelerate --upgrade, which updated accelerate to version 0.29.3.
Afterwards, I encountered another error: TypeError: pad_sequence(): argument 'padding_value' (position 3) must be float, not NoneType.

Could you please help me diagnose and resolve these issues? Here's my current environment setup:

accelerate                0.29.3
aiofiles                  23.2.1
altair                    5.3.0
annotated-types           0.6.0
anyio                     4.3.0
appdirs                   1.4.4
attrs                     23.2.0
bitsandbytes              0.42.0
certifi                   2024.2.2
charset-normalizer        3.3.2
click                     8.1.7
contourpy                 1.2.1
cycler                    0.12.1
deepspeed                 0.12.6
docker-pycreds            0.4.0
einops                    0.6.1
einops-exts               0.0.4
exceptiongroup            1.2.1
fastapi                   0.110.3
ffmpy                     0.3.2
filelock                  3.14.0
flash-attn                2.5.8
fonttools                 4.51.0
fsspec                    2024.3.1
gitdb                     4.0.11
GitPython                 3.1.43
gradio                    4.16.0
gradio_client             0.8.1
h11                       0.14.0
hjson                     3.1.0
httpcore                  0.17.3
httpx                     0.24.0
huggingface-hub           0.22.2
idna                      3.7
importlib_resources       6.4.0
Jinja2                    3.1.3
joblib                    1.4.0
jsonschema                4.21.1
jsonschema-specifications 2023.12.1
kiwisolver                1.4.5
llava                     1.2.2.post1 /usr/VLM/LLaVA-pp
markdown-it-py            3.0.0
markdown2                 2.4.13
MarkupSafe                2.1.5
matplotlib                3.8.4
mdurl                     0.1.2
mpmath                    1.3.0
networkx                  3.3
ninja                     1.11.1.1
numpy                     1.26.4
nvidia-cublas-cu12        12.1.3.1
nvidia-cuda-cupti-cu12    12.1.105
nvidia-cuda-nvrtc-cu12    12.1.105
nvidia-cuda-runtime-cu12  12.1.105
nvidia-cudnn-cu12         8.9.2.26
nvidia-cufft-cu12         11.0.2.54
nvidia-curand-cu12        10.3.2.106
nvidia-cusolver-cu12      11.4.5.107
nvidia-cusparse-cu12      12.1.0.106
nvidia-nccl-cu12          2.18.1
nvidia-nvjitlink-cu12     12.4.127
nvidia-nvtx-cu12          12.1.105
orjson                    3.10.1
packaging                 24.0
pandas                    2.2.2
peft                      0.10.0
pillow                    10.3.0
pip                       24.0
protobuf                  4.25.3
psutil                    5.9.8
py-cpuinfo                9.0.0
pydantic                  2.7.1
pydantic_core             2.18.2
pydub                     0.25.1
Pygments                  2.17.2
pynvml                    11.5.0
pyparsing                 3.1.2
python-dateutil           2.9.0.post0
python-multipart          0.0.9
pytz                      2024.1
PyYAML                    6.0.1
referencing               0.35.0
regex                     2024.4.28
requests                  2.31.0
rich                      13.7.1
rpds-py                   0.18.0
ruff                      0.4.2
safetensors               0.4.3
scikit-learn              1.2.2
scipy                     1.13.0
semantic-version          2.10.0
sentencepiece             0.1.99
sentry-sdk                2.0.1
setproctitle              1.3.3
setuptools                68.2.2
shellingham               1.5.4
shortuuid                 1.0.13
six                       1.16.0
smmap                     5.0.1
sniffio                   1.3.1
starlette                 0.37.2
svgwrite                  1.4.3
sympy                     1.12
threadpoolctl             3.5.0
timm                      0.6.13
tokenizers                0.19.1
tomlkit                   0.12.0
toolz                     0.12.1
torch                     2.1.2
torchvision               0.16.2
tqdm                      4.66.2
transformers              4.41.0.dev0
triton                    2.1.0
typer                     0.12.3
typing_extensions         4.11.0
tzdata                    2024.1
urllib3                   2.2.1
uvicorn                   0.29.0
wandb                     0.16.6
wavedrom                  2.0.3.post3
websockets                11.0.3
wheel                     0.41.2

Thank you for your help!

hellangleZ · 2024-05-01T13:41:12Z

Hi @hellangleZ

It should be in the LLaVA-pp/LLaVA folder.

STEP2

STEP3

Hi @hellangleZ

It should be in the LLaVA-pp/LLaVA folder.

Great, It's work now. it's a deepspeed issue

mmaaz60 · 2024-05-01T18:13:59Z

pip's dependency

Hi @Luo-Z13,

The error related to pip's dependency can be ignored.
The error TypeError: pad_sequence(): argument 'padding_value' (position 3) must be float, not NoneType occurs during LLaMA-3 based model training. Actually LLaMA-3 does not use any pad token however during LLaVA-LLaMA-3 training we need pad token. So the workaround is to add a special token and resize the embeddings. This is done at

LLaVA-pp/LLaMA-3-V/train.py

Line 1015 in b93d9c8

smart_tokenizer_and_embedding_resize(

.

Please make sure that baseline official LLaVA code is working properly. And then make sure to copy all the files related to LLaMA-3 in the corresponding directory. Lastly please note that to run LLaMA-3 based training you need to pass --version llama3.

I hope it will help and solve the issue. Good Luck.

mmaaz60 · 2024-05-01T18:15:27Z

Hi @hellangleZ @Luo-Z13,

I am closing this issue as @hellangleZ was able to run the trainings. Please feel free to create a new issue in case if you have any questions or encounter any other error. I appreciate your cooperation. Thank You.

fix typo in clean_sharegpt.py

mmaaz60 mentioned this issue Apr 30, 2024

Using phi-3 and LLava but some fields of phi3 network not support #7

Closed

mmaaz60 closed this as completed May 1, 2024

Luo-Z13 mentioned this issue May 2, 2024

Tokenization mismatch in LLaVA-LLaMA-3 #14

Closed

pythonlearner1025 pushed a commit to pythonlearner1025/LLaVA-pp that referenced this issue May 8, 2024

Merge pull request mbzuai-oryx#8 from eltociear/patch-1

fb797c4

fix typo in clean_sharegpt.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train with llava-llama3 #8

Train with llava-llama3 #8

hellangleZ commented Apr 30, 2024

mmaaz60 commented Apr 30, 2024

hellangleZ commented May 1, 2024

mmaaz60 commented May 1, 2024

hellangleZ commented May 1, 2024

hellangleZ commented May 1, 2024

Luo-Z13 commented May 1, 2024

hellangleZ commented May 1, 2024

mmaaz60 commented May 1, 2024

mmaaz60 commented May 1, 2024

Train with llava-llama3 #8

Train with llava-llama3 #8

Comments

hellangleZ commented Apr 30, 2024

mmaaz60 commented Apr 30, 2024

hellangleZ commented May 1, 2024

mmaaz60 commented May 1, 2024

hellangleZ commented May 1, 2024

hellangleZ commented May 1, 2024

Luo-Z13 commented May 1, 2024

hellangleZ commented May 1, 2024

mmaaz60 commented May 1, 2024

mmaaz60 commented May 1, 2024