Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tokenization mismatch in Phi-3 when finetune process #17

Closed
hellangleZ opened this issue May 3, 2024 · 5 comments
Closed

Tokenization mismatch in Phi-3 when finetune process #17

hellangleZ opened this issue May 3, 2024 · 5 comments

Comments

@hellangleZ
Copy link

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Using conversation format: phi3
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Using conversation format: phi3
[2024-05-03 23:34:36,587] [INFO] [partition_parameters.py:345:exit] finished initializing model - num_params = 586, num_elems = 4.12B
Formatting inputs...Skip in lazy mode
Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
Parameter Offload: Total persistent parameters: 530432 in 312 params
0%| | 0/5198 [00:00<?, ?it/s]WARNING: tokenization mismatch: 565 vs. 569. (ignored)
WARNING: tokenization mismatch: 505 vs. 514. (ignored)
WARNING: tokenization mismatch: 505 vs. 509. (ignored)
WARNING: tokenization mismatch: 505 vs. 514. (ignored)
WARNING: tokenization mismatch: 510 vs. 519. (ignored)
WARNING: tokenization mismatch: 465 vs. 485. (ignored)
WARNING: tokenization mismatch: 336 vs. 340. (ignored)
WARNING: tokenization mismatch: 494 vs. 497. (ignored)
WARNING: tokenization mismatch: 471 vs. 480. (ignored)
WARNING: tokenization mismatch: 524 vs. 533. (ignored)
WARNING: tokenization mismatch: 477 vs. 485. (ignored)
WARNING: tokenization mismatch: 509 vs. 518. (ignored)
WARNING: tokenization mismatch: 514 vs. 523. (ignored)
WARNING: tokenization mismatch: 539 vs. 566. (ignored)
WARNING: tokenization mismatch: 672 vs. 703. (ignored)
WARNING: tokenization mismatch: 322 vs. 336. (ignored)
WARNING: tokenization mismatch: 516 vs. 525. (ignored)
WARNING: tokenization mismatch: 508 vs. 517. (ignored)
WARNING: tokenization mismatch: 501 vs. 510. (ignored)
WARNING: tokenization mismatch: 503 vs. 528. (ignored)
WARNING: tokenization mismatch: 529 vs. 538. (ignored)
WARNING: tokenization mismatch: 477 vs. 485. (ignored)
WARNING: tokenization mismatch: 502 vs. 511. (ignored)
WARNING: tokenization mismatch: 467 vs. 475. (ignored)
WARNING: tokenization mismatch: 536 vs. 545. (ignored)
WARNING: tokenization mismatch: 512 vs. 521. (ignored)
WARNING: tokenization mismatch: 302 vs. 307. (ignored)
WARNING: tokenization mismatch: 365 vs. 371. (ignored)
WARNING: tokenization mismatch: 337 vs. 354. (ignored)
WARNING: tokenization mismatch: 152 vs. 158. (ignored)
WARNING: tokenization mismatch: 526 vs. 535. (ignored)
WARNING: tokenization mismatch: 371 vs. 374. (ignored)
WARNING: tokenization mismatch: 325 vs. 341. (ignored)
WARNING: tokenization mismatch: 372 vs. 390. (ignored)
WARNING: tokenization mismatch: 480 vs. 483. (ignored)
WARNING: tokenization mismatch: 544 vs. 548. (ignored)WARNING: tokenization mismatch: 138 vs. 142. (ignored)

WARNING: tokenization mismatch: 630 vs. 633. (ignored)
WARNING: tokenization mismatch: 200 vs. 203. (ignored)
WARNING: tokenization mismatch: 227 vs. 236. (ignored)
WARNING: tokenization mismatch: 221 vs. 225. (ignored)
WARNING: tokenization mismatch: 494 vs. 503. (ignored)
WARNING: tokenization mismatch: 398 vs. 405. (ignored)
WARNING: tokenization mismatch: 121 vs. 125. (ignored)
WARNING: tokenization mismatch: 516 vs. 525. (ignored)
WARNING: tokenization mismatch: 404 vs. 411. (ignored)
WARNING: tokenization mismatch: 511 vs. 520. (ignored)
WARNING: tokenization mismatch: 135 vs. 139. (ignored)
WARNING: tokenization mismatch: 339 vs. 343. (ignored)
WARNING: tokenization mismatch: 353 vs. 357. (ignored)
WARNING: tokenization mismatch: 172 vs. 175. (ignored)
WARNING: tokenization mismatch: 332 vs. 338. (ignored)
WARNING: tokenization mismatch: 128 vs. 132. (ignored)
WARNING: tokenization mismatch: 153 vs. 157. (ignored)
WARNING: tokenization mismatch: 249 vs. 259. (ignored)
WARNING: tokenization mismatch: 356 vs. 371. (ignored)
WARNING: tokenization mismatch: 509 vs. 518. (ignored)
WARNING: tokenization mismatch: 516 vs. 525. (ignored)
WARNING: tokenization mismatch: 118 vs. 121. (ignored)
WARNING: tokenization mismatch: 150 vs. 154. (ignored)
WARNING: tokenization mismatch: 155 vs. 159. (ignored)
WARNING: tokenization mismatch: 172 vs. 179. (ignored)
WARNING: tokenization mismatch: 123 vs. 126. (ignored)
WARNING: tokenization mismatch: 102 vs. 105. (ignored)
WARNING: tokenization mismatch: 123 vs. 127. (ignored)
WARNING: tokenization mismatch: 106 vs. 109. (ignored)
WARNING: tokenization mismatch: 505 vs. 514. (ignored)
WARNING: tokenization mismatch: 387 vs. 393. (ignored)
WARNING: tokenization mismatch: 128 vs. 132. (ignored)
WARNING: tokenization mismatch: 498 vs. 507. (ignored)
WARNING: tokenization mismatch: 91 vs. 94. (ignored)
WARNING: tokenization mismatch: 117 vs. 121. (ignored)
WARNING: tokenization mismatch: 151 vs. 155. (ignored)
WARNING: tokenization mismatch: 194 vs. 203. (ignored)
WARNING: tokenization mismatch: 126 vs. 130. (ignored)
WARNING: tokenization mismatch: 91 vs. 94. (ignored)

@hellangleZ
Copy link
Author

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Using conversation format: phi3 Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Using conversation format: phi3 [2024-05-03 23:34:36,587] [INFO] [partition_parameters.py:345:exit] finished initializing model - num_params = 586, num_elems = 4.12B Formatting inputs...Skip in lazy mode Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. Parameter Offload: Total persistent parameters: 530432 in 312 params 0%| | 0/5198 [00:00<?, ?it/s]WARNING: tokenization mismatch: 565 vs. 569. (ignored) WARNING: tokenization mismatch: 505 vs. 514. (ignored) WARNING: tokenization mismatch: 505 vs. 509. (ignored) WARNING: tokenization mismatch: 505 vs. 514. (ignored) WARNING: tokenization mismatch: 510 vs. 519. (ignored) WARNING: tokenization mismatch: 465 vs. 485. (ignored) WARNING: tokenization mismatch: 336 vs. 340. (ignored) WARNING: tokenization mismatch: 494 vs. 497. (ignored) WARNING: tokenization mismatch: 471 vs. 480. (ignored) WARNING: tokenization mismatch: 524 vs. 533. (ignored) WARNING: tokenization mismatch: 477 vs. 485. (ignored) WARNING: tokenization mismatch: 509 vs. 518. (ignored) WARNING: tokenization mismatch: 514 vs. 523. (ignored) WARNING: tokenization mismatch: 539 vs. 566. (ignored) WARNING: tokenization mismatch: 672 vs. 703. (ignored) WARNING: tokenization mismatch: 322 vs. 336. (ignored) WARNING: tokenization mismatch: 516 vs. 525. (ignored) WARNING: tokenization mismatch: 508 vs. 517. (ignored) WARNING: tokenization mismatch: 501 vs. 510. (ignored) WARNING: tokenization mismatch: 503 vs. 528. (ignored) WARNING: tokenization mismatch: 529 vs. 538. (ignored) WARNING: tokenization mismatch: 477 vs. 485. (ignored) WARNING: tokenization mismatch: 502 vs. 511. (ignored) WARNING: tokenization mismatch: 467 vs. 475. (ignored) WARNING: tokenization mismatch: 536 vs. 545. (ignored) WARNING: tokenization mismatch: 512 vs. 521. (ignored) WARNING: tokenization mismatch: 302 vs. 307. (ignored) WARNING: tokenization mismatch: 365 vs. 371. (ignored) WARNING: tokenization mismatch: 337 vs. 354. (ignored) WARNING: tokenization mismatch: 152 vs. 158. (ignored) WARNING: tokenization mismatch: 526 vs. 535. (ignored) WARNING: tokenization mismatch: 371 vs. 374. (ignored) WARNING: tokenization mismatch: 325 vs. 341. (ignored) WARNING: tokenization mismatch: 372 vs. 390. (ignored) WARNING: tokenization mismatch: 480 vs. 483. (ignored) WARNING: tokenization mismatch: 544 vs. 548. (ignored)WARNING: tokenization mismatch: 138 vs. 142. (ignored)

WARNING: tokenization mismatch: 630 vs. 633. (ignored) WARNING: tokenization mismatch: 200 vs. 203. (ignored) WARNING: tokenization mismatch: 227 vs. 236. (ignored) WARNING: tokenization mismatch: 221 vs. 225. (ignored) WARNING: tokenization mismatch: 494 vs. 503. (ignored) WARNING: tokenization mismatch: 398 vs. 405. (ignored) WARNING: tokenization mismatch: 121 vs. 125. (ignored) WARNING: tokenization mismatch: 516 vs. 525. (ignored) WARNING: tokenization mismatch: 404 vs. 411. (ignored) WARNING: tokenization mismatch: 511 vs. 520. (ignored) WARNING: tokenization mismatch: 135 vs. 139. (ignored) WARNING: tokenization mismatch: 339 vs. 343. (ignored) WARNING: tokenization mismatch: 353 vs. 357. (ignored) WARNING: tokenization mismatch: 172 vs. 175. (ignored) WARNING: tokenization mismatch: 332 vs. 338. (ignored) WARNING: tokenization mismatch: 128 vs. 132. (ignored) WARNING: tokenization mismatch: 153 vs. 157. (ignored) WARNING: tokenization mismatch: 249 vs. 259. (ignored) WARNING: tokenization mismatch: 356 vs. 371. (ignored) WARNING: tokenization mismatch: 509 vs. 518. (ignored) WARNING: tokenization mismatch: 516 vs. 525. (ignored) WARNING: tokenization mismatch: 118 vs. 121. (ignored) WARNING: tokenization mismatch: 150 vs. 154. (ignored) WARNING: tokenization mismatch: 155 vs. 159. (ignored) WARNING: tokenization mismatch: 172 vs. 179. (ignored) WARNING: tokenization mismatch: 123 vs. 126. (ignored) WARNING: tokenization mismatch: 102 vs. 105. (ignored) WARNING: tokenization mismatch: 123 vs. 127. (ignored) WARNING: tokenization mismatch: 106 vs. 109. (ignored) WARNING: tokenization mismatch: 505 vs. 514. (ignored) WARNING: tokenization mismatch: 387 vs. 393. (ignored) WARNING: tokenization mismatch: 128 vs. 132. (ignored) WARNING: tokenization mismatch: 498 vs. 507. (ignored) WARNING: tokenization mismatch: 91 vs. 94. (ignored) WARNING: tokenization mismatch: 117 vs. 121. (ignored) WARNING: tokenization mismatch: 151 vs. 155. (ignored) WARNING: tokenization mismatch: 194 vs. 203. (ignored) WARNING: tokenization mismatch: 126 vs. 130. (ignored) WARNING: tokenization mismatch: 91 vs. 94. (ignored)

I use the phi3-instrcuct May 1th version, and pretrain is working good, but when FT process, this error occurs

I found there is a code confliction

At train.py

image

Is it should be change to same with llama3 ?

image

@hellangleZ
Copy link
Author

hellangleZ commented May 4, 2024

After test, this issue not occurs on LLama3 fine_tuning

@mmaaz60
Copy link
Member

mmaaz60 commented May 4, 2024

Hi @hellangleZ

Thank you for your interest in our work. One of the reasons of this issue could be the wrong --version value. Can you please confirm if you are using --version phi3_instruct in your experiment?

If the issue does not resolve, please provide detailed steps you followed to run the training so that I can reproduce the error and help you better. Thank You :)

@hellangleZ
Copy link
Author

hellangleZ commented May 4, 2024

Hi @hellangleZ

Thank you for your interest in our work. One of the reasons of this issue could be the wrong --version value. Can you please confirm if you are using --version phi3_instruct in your experiment?

If the issue does not resolve, please provide detailed steps you followed to run the training so that I can reproduce the error and help you better. Thank You :)

Yes I use phi3_instruct

Hi @mmaaz60

I found some issue on pretrain process.

Like
image

And

image

I think the problem maybe due to the two fields, could you help to check on Phi-3 repo

My fine_tuning scripts all of that copy your template

but after change field , it still show mismatch tokenizer

deepspeed llava/train/train_mem.py
--lora_enable True --lora_r 128 --lora_alpha 256 --mm_projector_lr 2e-5
--deepspeed ./scripts/zero3.json
--model_name_or_path /data2/phi3-A11
--version phi3_instruct
--data_path /data2/llavaft/llava_v1_5_mix665k.json
--image_folder /data2/LLaVA-main/playground/data
--vision_tower /data2/openai/clip-vit-large-patch14-336
--pretrain_mm_mlp_adapter ./checkpoints/llava-v1.5-phi3-mini-pretrain/mm_projector.bin
--mm_projector_type mlp2x_gelu
--mm_vision_select_layer -2
--mm_use_im_start_end False
--mm_use_im_patch_token False
--image_aspect_ratio pad
--group_by_modality_length True
--bf16 True
--output_dir ./checkpoints/llava-v1.5-phi3-mini-lora
--num_train_epochs 1
--per_device_train_batch_size 16
--per_device_eval_batch_size 4
--gradient_accumulation_steps 2
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 50000
--save_total_limit 1
--learning_rate 2e-4
--weight_decay 0.
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 1 \

@hellangleZ
Copy link
Author

Please close it, ,just using the newest phi-3 model, it solves the problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants