Tokenization mismatch in Phi-3 when finetune process #17

hellangleZ · 2024-05-03T23:56:49Z

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Using conversation format: phi3
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Using conversation format: phi3
[2024-05-03 23:34:36,587] [INFO] [partition_parameters.py:345:exit] finished initializing model - num_params = 586, num_elems = 4.12B
Formatting inputs...Skip in lazy mode
Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
Parameter Offload: Total persistent parameters: 530432 in 312 params
0%| | 0/5198 [00:00<?, ?it/s]WARNING: tokenization mismatch: 565 vs. 569. (ignored)
WARNING: tokenization mismatch: 505 vs. 514. (ignored)
WARNING: tokenization mismatch: 505 vs. 509. (ignored)
WARNING: tokenization mismatch: 505 vs. 514. (ignored)
WARNING: tokenization mismatch: 510 vs. 519. (ignored)
WARNING: tokenization mismatch: 465 vs. 485. (ignored)
WARNING: tokenization mismatch: 336 vs. 340. (ignored)
WARNING: tokenization mismatch: 494 vs. 497. (ignored)
WARNING: tokenization mismatch: 471 vs. 480. (ignored)
WARNING: tokenization mismatch: 524 vs. 533. (ignored)
WARNING: tokenization mismatch: 477 vs. 485. (ignored)
WARNING: tokenization mismatch: 509 vs. 518. (ignored)
WARNING: tokenization mismatch: 514 vs. 523. (ignored)
WARNING: tokenization mismatch: 539 vs. 566. (ignored)
WARNING: tokenization mismatch: 672 vs. 703. (ignored)
WARNING: tokenization mismatch: 322 vs. 336. (ignored)
WARNING: tokenization mismatch: 516 vs. 525. (ignored)
WARNING: tokenization mismatch: 508 vs. 517. (ignored)
WARNING: tokenization mismatch: 501 vs. 510. (ignored)
WARNING: tokenization mismatch: 503 vs. 528. (ignored)
WARNING: tokenization mismatch: 529 vs. 538. (ignored)
WARNING: tokenization mismatch: 477 vs. 485. (ignored)
WARNING: tokenization mismatch: 502 vs. 511. (ignored)
WARNING: tokenization mismatch: 467 vs. 475. (ignored)
WARNING: tokenization mismatch: 536 vs. 545. (ignored)
WARNING: tokenization mismatch: 512 vs. 521. (ignored)
WARNING: tokenization mismatch: 302 vs. 307. (ignored)
WARNING: tokenization mismatch: 365 vs. 371. (ignored)
WARNING: tokenization mismatch: 337 vs. 354. (ignored)
WARNING: tokenization mismatch: 152 vs. 158. (ignored)
WARNING: tokenization mismatch: 526 vs. 535. (ignored)
WARNING: tokenization mismatch: 371 vs. 374. (ignored)
WARNING: tokenization mismatch: 325 vs. 341. (ignored)
WARNING: tokenization mismatch: 372 vs. 390. (ignored)
WARNING: tokenization mismatch: 480 vs. 483. (ignored)
WARNING: tokenization mismatch: 544 vs. 548. (ignored)WARNING: tokenization mismatch: 138 vs. 142. (ignored)

WARNING: tokenization mismatch: 630 vs. 633. (ignored)
WARNING: tokenization mismatch: 200 vs. 203. (ignored)
WARNING: tokenization mismatch: 227 vs. 236. (ignored)
WARNING: tokenization mismatch: 221 vs. 225. (ignored)
WARNING: tokenization mismatch: 494 vs. 503. (ignored)
WARNING: tokenization mismatch: 398 vs. 405. (ignored)
WARNING: tokenization mismatch: 121 vs. 125. (ignored)
WARNING: tokenization mismatch: 516 vs. 525. (ignored)
WARNING: tokenization mismatch: 404 vs. 411. (ignored)
WARNING: tokenization mismatch: 511 vs. 520. (ignored)
WARNING: tokenization mismatch: 135 vs. 139. (ignored)
WARNING: tokenization mismatch: 339 vs. 343. (ignored)
WARNING: tokenization mismatch: 353 vs. 357. (ignored)
WARNING: tokenization mismatch: 172 vs. 175. (ignored)
WARNING: tokenization mismatch: 332 vs. 338. (ignored)
WARNING: tokenization mismatch: 128 vs. 132. (ignored)
WARNING: tokenization mismatch: 153 vs. 157. (ignored)
WARNING: tokenization mismatch: 249 vs. 259. (ignored)
WARNING: tokenization mismatch: 356 vs. 371. (ignored)
WARNING: tokenization mismatch: 509 vs. 518. (ignored)
WARNING: tokenization mismatch: 516 vs. 525. (ignored)
WARNING: tokenization mismatch: 118 vs. 121. (ignored)
WARNING: tokenization mismatch: 150 vs. 154. (ignored)
WARNING: tokenization mismatch: 155 vs. 159. (ignored)
WARNING: tokenization mismatch: 172 vs. 179. (ignored)
WARNING: tokenization mismatch: 123 vs. 126. (ignored)
WARNING: tokenization mismatch: 102 vs. 105. (ignored)
WARNING: tokenization mismatch: 123 vs. 127. (ignored)
WARNING: tokenization mismatch: 106 vs. 109. (ignored)
WARNING: tokenization mismatch: 505 vs. 514. (ignored)
WARNING: tokenization mismatch: 387 vs. 393. (ignored)
WARNING: tokenization mismatch: 128 vs. 132. (ignored)
WARNING: tokenization mismatch: 498 vs. 507. (ignored)
WARNING: tokenization mismatch: 91 vs. 94. (ignored)
WARNING: tokenization mismatch: 117 vs. 121. (ignored)
WARNING: tokenization mismatch: 151 vs. 155. (ignored)
WARNING: tokenization mismatch: 194 vs. 203. (ignored)
WARNING: tokenization mismatch: 126 vs. 130. (ignored)
WARNING: tokenization mismatch: 91 vs. 94. (ignored)

hellangleZ · 2024-05-04T00:00:25Z

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Using conversation format: phi3 Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Using conversation format: phi3 [2024-05-03 23:34:36,587] [INFO] [partition_parameters.py:345:exit] finished initializing model - num_params = 586, num_elems = 4.12B Formatting inputs...Skip in lazy mode Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. Parameter Offload: Total persistent parameters: 530432 in 312 params 0%| | 0/5198 [00:00<?, ?it/s]WARNING: tokenization mismatch: 565 vs. 569. (ignored) WARNING: tokenization mismatch: 505 vs. 514. (ignored) WARNING: tokenization mismatch: 505 vs. 509. (ignored) WARNING: tokenization mismatch: 505 vs. 514. (ignored) WARNING: tokenization mismatch: 510 vs. 519. (ignored) WARNING: tokenization mismatch: 465 vs. 485. (ignored) WARNING: tokenization mismatch: 336 vs. 340. (ignored) WARNING: tokenization mismatch: 494 vs. 497. (ignored) WARNING: tokenization mismatch: 471 vs. 480. (ignored) WARNING: tokenization mismatch: 524 vs. 533. (ignored) WARNING: tokenization mismatch: 477 vs. 485. (ignored) WARNING: tokenization mismatch: 509 vs. 518. (ignored) WARNING: tokenization mismatch: 514 vs. 523. (ignored) WARNING: tokenization mismatch: 539 vs. 566. (ignored) WARNING: tokenization mismatch: 672 vs. 703. (ignored) WARNING: tokenization mismatch: 322 vs. 336. (ignored) WARNING: tokenization mismatch: 516 vs. 525. (ignored) WARNING: tokenization mismatch: 508 vs. 517. (ignored) WARNING: tokenization mismatch: 501 vs. 510. (ignored) WARNING: tokenization mismatch: 503 vs. 528. (ignored) WARNING: tokenization mismatch: 529 vs. 538. (ignored) WARNING: tokenization mismatch: 477 vs. 485. (ignored) WARNING: tokenization mismatch: 502 vs. 511. (ignored) WARNING: tokenization mismatch: 467 vs. 475. (ignored) WARNING: tokenization mismatch: 536 vs. 545. (ignored) WARNING: tokenization mismatch: 512 vs. 521. (ignored) WARNING: tokenization mismatch: 302 vs. 307. (ignored) WARNING: tokenization mismatch: 365 vs. 371. (ignored) WARNING: tokenization mismatch: 337 vs. 354. (ignored) WARNING: tokenization mismatch: 152 vs. 158. (ignored) WARNING: tokenization mismatch: 526 vs. 535. (ignored) WARNING: tokenization mismatch: 371 vs. 374. (ignored) WARNING: tokenization mismatch: 325 vs. 341. (ignored) WARNING: tokenization mismatch: 372 vs. 390. (ignored) WARNING: tokenization mismatch: 480 vs. 483. (ignored) WARNING: tokenization mismatch: 544 vs. 548. (ignored)WARNING: tokenization mismatch: 138 vs. 142. (ignored)

WARNING: tokenization mismatch: 630 vs. 633. (ignored) WARNING: tokenization mismatch: 200 vs. 203. (ignored) WARNING: tokenization mismatch: 227 vs. 236. (ignored) WARNING: tokenization mismatch: 221 vs. 225. (ignored) WARNING: tokenization mismatch: 494 vs. 503. (ignored) WARNING: tokenization mismatch: 398 vs. 405. (ignored) WARNING: tokenization mismatch: 121 vs. 125. (ignored) WARNING: tokenization mismatch: 516 vs. 525. (ignored) WARNING: tokenization mismatch: 404 vs. 411. (ignored) WARNING: tokenization mismatch: 511 vs. 520. (ignored) WARNING: tokenization mismatch: 135 vs. 139. (ignored) WARNING: tokenization mismatch: 339 vs. 343. (ignored) WARNING: tokenization mismatch: 353 vs. 357. (ignored) WARNING: tokenization mismatch: 172 vs. 175. (ignored) WARNING: tokenization mismatch: 332 vs. 338. (ignored) WARNING: tokenization mismatch: 128 vs. 132. (ignored) WARNING: tokenization mismatch: 153 vs. 157. (ignored) WARNING: tokenization mismatch: 249 vs. 259. (ignored) WARNING: tokenization mismatch: 356 vs. 371. (ignored) WARNING: tokenization mismatch: 509 vs. 518. (ignored) WARNING: tokenization mismatch: 516 vs. 525. (ignored) WARNING: tokenization mismatch: 118 vs. 121. (ignored) WARNING: tokenization mismatch: 150 vs. 154. (ignored) WARNING: tokenization mismatch: 155 vs. 159. (ignored) WARNING: tokenization mismatch: 172 vs. 179. (ignored) WARNING: tokenization mismatch: 123 vs. 126. (ignored) WARNING: tokenization mismatch: 102 vs. 105. (ignored) WARNING: tokenization mismatch: 123 vs. 127. (ignored) WARNING: tokenization mismatch: 106 vs. 109. (ignored) WARNING: tokenization mismatch: 505 vs. 514. (ignored) WARNING: tokenization mismatch: 387 vs. 393. (ignored) WARNING: tokenization mismatch: 128 vs. 132. (ignored) WARNING: tokenization mismatch: 498 vs. 507. (ignored) WARNING: tokenization mismatch: 91 vs. 94. (ignored) WARNING: tokenization mismatch: 117 vs. 121. (ignored) WARNING: tokenization mismatch: 151 vs. 155. (ignored) WARNING: tokenization mismatch: 194 vs. 203. (ignored) WARNING: tokenization mismatch: 126 vs. 130. (ignored) WARNING: tokenization mismatch: 91 vs. 94. (ignored)

I use the phi3-instrcuct May 1th version, and pretrain is working good, but when FT process, this error occurs

I found there is a code confliction

At train.py

Is it should be change to same with llama3 ?

hellangleZ · 2024-05-04T00:25:46Z

After test, this issue not occurs on LLama3 fine_tuning

mmaaz60 · 2024-05-04T00:32:53Z

Hi @hellangleZ

Thank you for your interest in our work. One of the reasons of this issue could be the wrong --version value. Can you please confirm if you are using --version phi3_instruct in your experiment?

If the issue does not resolve, please provide detailed steps you followed to run the training so that I can reproduce the error and help you better. Thank You :)

hellangleZ · 2024-05-04T01:31:36Z

Hi @hellangleZ

Thank you for your interest in our work. One of the reasons of this issue could be the wrong --version value. Can you please confirm if you are using --version phi3_instruct in your experiment?

If the issue does not resolve, please provide detailed steps you followed to run the training so that I can reproduce the error and help you better. Thank You :)

Yes I use phi3_instruct

Hi @mmaaz60

I found some issue on pretrain process.

Like

And

I think the problem maybe due to the two fields, could you help to check on Phi-3 repo

My fine_tuning scripts all of that copy your template

but after change field , it still show mismatch tokenizer

deepspeed llava/train/train_mem.py
--lora_enable True --lora_r 128 --lora_alpha 256 --mm_projector_lr 2e-5
--deepspeed ./scripts/zero3.json
--model_name_or_path /data2/phi3-A11
--version phi3_instruct
--data_path /data2/llavaft/llava_v1_5_mix665k.json
--image_folder /data2/LLaVA-main/playground/data
--vision_tower /data2/openai/clip-vit-large-patch14-336
--pretrain_mm_mlp_adapter ./checkpoints/llava-v1.5-phi3-mini-pretrain/mm_projector.bin
--mm_projector_type mlp2x_gelu
--mm_vision_select_layer -2
--mm_use_im_start_end False
--mm_use_im_patch_token False
--image_aspect_ratio pad
--group_by_modality_length True
--bf16 True
--output_dir ./checkpoints/llava-v1.5-phi3-mini-lora
--num_train_epochs 1
--per_device_train_batch_size 16
--per_device_eval_batch_size 4
--gradient_accumulation_steps 2
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 50000
--save_total_limit 1
--learning_rate 2e-4
--weight_decay 0.
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 1 \

hellangleZ · 2024-05-04T07:34:21Z

Please close it, ,just using the newest phi-3 model, it solves the problem

hellangleZ closed this as completed May 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tokenization mismatch in Phi-3 when finetune process #17

Tokenization mismatch in Phi-3 when finetune process #17

hellangleZ commented May 3, 2024

hellangleZ commented May 4, 2024

hellangleZ commented May 4, 2024 •

edited

Loading

mmaaz60 commented May 4, 2024

hellangleZ commented May 4, 2024 •

edited

Loading

hellangleZ commented May 4, 2024

Tokenization mismatch in Phi-3 when finetune process #17

Tokenization mismatch in Phi-3 when finetune process #17

Comments

hellangleZ commented May 3, 2024

hellangleZ commented May 4, 2024

hellangleZ commented May 4, 2024 • edited Loading

mmaaz60 commented May 4, 2024

hellangleZ commented May 4, 2024 • edited Loading

hellangleZ commented May 4, 2024

hellangleZ commented May 4, 2024 •

edited

Loading

hellangleZ commented May 4, 2024 •

edited

Loading