-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Training with Qwen2 backend got loss 0 #1153
Comments
me too |
I found that the reason for this problem is different tokenizer rules.
And then, I added the method
After these operations, the mismatch warning disappeared. However, I must mention that I don't have GPUs for training now, so there may be other problems. Hope this helps you. |
@yiyexy hello, nice catch. Am training normal now. |
Yes, I trained on LLaVA pretrain data. Unfortunately, I don't have data to enhance the model's capability in Chinese. By the way, I'm currently developing a new data processing pipeline which may solve this problem one day. |
@yiyexy Will u consider share your processing pipeline? Which part problem to solve? There are some Chinese data but I think their quality is poor. |
@lucasjinreal I will. But it still has some problems to be solved. It's a long way. |
@yiyexy Hello, Your loss looks not like stage 1? BTW, you probably should use qwen1.5-7b-chat model. Otherwise you can not sft efficiently. However, qwen using chatml chat format, not llava default. How do u change it? |
You are right. The loss is stage 2. And I use qwen1.5-7b-chat model for this stage. BTW, I didn't meet problem with the format. The SFT training is normal. Maybe I ignored some things. |
@20191864218 Maybe you need set some parameters for Qwen1.5. #1146 |
@yiyexy Using llava template on qwen chat model might introduce unwanted output when chat. This is a common issue. qwen using chatml format which using <|im_end|> as spepartor/ |
Thanks for your reminder. I will pay attention to this issue. I haven't trained a llava-qwen model due to a lack of GPU resources and other work commitments. I will train a llava-qwen model as soon as possible and share the result with you. |
@yiyexy Thank u. Am doing finetune stage now. Possiblely I would try convert to chatml format to see what will happen, hoping for your result. |
Thank you, but I've encountered some issues after making the changes. Could you help me with it? |
so are you use qwen-chat to llava sft? |
Yes, am using chatml format to traing now, will update info here. this is currently Qwen1.8b stage 2 loss goes:
|
@lucasjinreal I meet the same problem. Can you share your code of using qwen1.5-chat llm? |
I use qwen1.5-7b-chat in the pretrain stage is normal, but sft stage loss is zero. I checked the conversation is aligned. Is there any suggestions @lucasjinreal ? In the training i got an warning : checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None. Can the warning be ignored? |
Seems like inputs have None. check the data or add some assersations. |
Hello, do you have a link for replacing the visual encoder? |
Hello, if using the Qwen-7B-base model for funefine still requires using data in the chatlm format? Thank you for your help |
I think base can not be used in vlm, it doens't have chat abilities,. |
I want to create a model solely for generating reports, without requiring strong conversational abilities. Can I use the llava fine-tuning data format when fine-tuning? |
Did you verify your method? The LLaVA SFT data is designed for QA tasks, so the results might not be good if you use a base model. |
@20191864218 This error appears to be due to a corrupted weight file. Please ensure that your weight file has been saved correctly. |
Thank you for your response. I merged the LoRA weights according to the |
me too!!! |
Thank you! |
Hi, Thanks for sharing. I am working on stage 1 training using Qwen-1.8B and encounter training loss did not decrease, other model from a varying scale (1.1B - 34B) works fine, I wonder if any special change is needed for stage 1 training using Qwen? |
@VincentDENGP You mean the loss did not decrease only on Qwen? Maybe you need a larger scale of Qwen? My loss decreased normally with Qwen-7B in stage 1. And I will checkout this PR later to avoid any differences. |
@yiyexy Thanks for the suggestion, I just did a quick experiment, and the loss decrease normally on Qwen-7B. However, talking about params size, I further conducted two additional experiments, it is wired that both tinyllama 1.1B and stablelm 1.6B loss can decrease normally, only Qwen-1.5-0.5B and Qwen-1.5-1.8B can not decrease. |
hey, can you share the code related to making llm backend with qwen2? |
hey, can you share the code related to making llm backend with qwen-7b? |
Hello! I used CC3M-Pretrain-595K to pretrain Qwen2-1.5B and a few chinese data (about 1000 samples) for finetuning. However, when I use the follow code to infer: args = type('Args', (), { eval_model(args) I got nonsense response like : Can anyone help ? Thanks ! |
I recently trained with Qwen2. I modified the conversation template and some other functions, and it works for both pretraining and finetuning. Here is my working repository: https://github.com/TobyYang7/Llava_Qwen2 |
Have the results improved after fine-tuning using this template? |
Due to the limitation of GPU resources, I do not have preliminary results yet. You can prepare the dataset and give it a try. |
Question
I got loss to be 0 when training on Qwen2 backend,
{'loss': 0.0, 'learning_rate': 0.00015267175572519084, 'epoch': 0.0}
0%|▎ | 20/8720 [01:38<11:01:39, 4.56s/it]WARNING: tokenization mismatch: 47 vs. 48. (ignored)
WARNING: tokenization mismatch: 54 vs. 55. (ignored)
WARNING: tokenization mismatch: 46 vs. 47. (ignored)
WARNING: tokenization mismatch: 43 vs. 44. (ignored)
What could be the reason caused it?
The text was updated successfully, but these errors were encountered: