[Question] finetune LLaVA-1.5 with LoRA.: does not appear to have a file named config.json. #729

yangzian035210 · 2023-11-01T06:57:23Z

Question

Thanks for your great work! I have a question about finetune LLaVA-1.5 with LoRA:
OSError: /checkpoints/llava-v1.5-13b-lora-v2/checkpoint-6000 does not appear to have a file named config.json.

dhifafaz · 2023-11-02T13:37:46Z

have you figured it out?

CiaoHe · 2023-11-07T13:54:02Z

It seems like we need add a trigger when saving intermediate checkpoint, like the ones after train finished:

if training_args.lora_enable:
    state_dict = get_peft_state_maybe_zero_3(
        model.named_parameters(), training_args.lora_bias
    )
    non_lora_state_dict = get_peft_state_non_lora_maybe_zero_3(
        model.named_parameters()
    )
    if training_args.local_rank == 0 or training_args.local_rank == -1:
        model.config.save_pretrained(training_args.output_dir)
        model.save_pretrained(training_args.output_dir, state_dict=state_dict)
        torch.save(non_lora_state_dict, os.path.join(training_args.output_dir, 'non_lora_trainables.bin'))
else:
    safe_save_model_for_hf_trainer(trainer=trainer,
                                   output_dir=training_args.output_dir)

CiaoHe · 2023-11-07T15:28:06Z

I added a custom callback just above the trainer initialize

# save callback
from transformers import TrainerCallback
class SaveCallback(TrainerCallback):
    def on_save(self, args, state, control, **kwargs):
        checkpoint_dir = os.path.join(args.output_dir, 'checkpoint-{}'.format(state.global_step))
        if args.lora_enable:
            state_dict = get_peft_state_maybe_zero_3(
                model.named_parameters(), training_args.lora_bias
            )
            non_lora_state_dict = get_peft_state_non_lora_maybe_zero_3(
                model.named_parameters()
            )
            if args.local_rank in [-1, 0]:
                model.config.save_pretrained(checkpoint_dir)
                model.save_pretrained(checkpoint_dir, state_dict=state_dict)
                torch.save(non_lora_state_dict, os.path.join(checkpoint_dir, 'non_lora_trainables.bin'))

and pass it to the trainer initializing

trainer = LLaVATrainer(model=model,
                tokenizer=tokenizer,
                args=training_args,
                callbacks=[SaveCallback()],
                **data_module)

wuwu-C · 2024-04-20T12:41:28Z

hey I met the same problem. Did you solve this?

user074 mentioned this issue May 1, 2024

[Usage] finetune_task_lora.sh checkpoints usage #1423

Open

supech mentioned this issue Jun 19, 2024

[Questions] "non_lora_trainables.bin" has an impact on the model #1566

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] finetune LLaVA-1.5 with LoRA.: does not appear to have a file named config.json. #729

[Question] finetune LLaVA-1.5 with LoRA.: does not appear to have a file named config.json. #729

yangzian035210 commented Nov 1, 2023

dhifafaz commented Nov 2, 2023

CiaoHe commented Nov 7, 2023

CiaoHe commented Nov 7, 2023 •

edited

Loading

wuwu-C commented Apr 20, 2024

[Question] finetune LLaVA-1.5 with LoRA.: does not appear to have a file named config.json. #729

[Question] finetune LLaVA-1.5 with LoRA.: does not appear to have a file named config.json. #729

Comments

yangzian035210 commented Nov 1, 2023

Question

dhifafaz commented Nov 2, 2023

CiaoHe commented Nov 7, 2023

CiaoHe commented Nov 7, 2023 • edited Loading

wuwu-C commented Apr 20, 2024

CiaoHe commented Nov 7, 2023 •

edited

Loading