Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In what order should I reproduce the paper? #12

Open
rixejzvdl649 opened this issue Jun 27, 2024 · 6 comments
Open

In what order should I reproduce the paper? #12

rixejzvdl649 opened this issue Jun 27, 2024 · 6 comments

Comments

@rixejzvdl649
Copy link

step1
pretrain_projector_image_encoder.sh
step2
pretrain_projector_video_encoder.sh
step3
finetune_dual_encoder.sh
step4
eval/vcgbench/inference/run_ddp_inference.sh
step5
eval/vcgbench/gpt_evaluation/vcgbench_evaluate.sh

#!/bin/sh


export DATASET_DIR=/mnt2/ninghuayang/data/videogpt_plus_dataset

BASE_LLM_PATH=microsoft/Phi-3-mini-4k-instruct
VISION_TOWER=OpenGVLab/InternVideo2-Stage2_1B-224p-f4
IMAGE_VISION_TOWER=openai/clip-vit-large-patch14-336
PROJECTOR_TYPE=mlp2x_gelu
#PRETRAIN_VIDEO_MLP_PATH=MBZUAI/VideoGPT-plus_Phi3-mini-4k_Pretrain/mlp2x_gelu_internvideo2/mm_projector.bin
#PRETRAIN_IMAGE_MLP_PATH=MBZUAI/VideoGPT-plus_Phi3-mini-4k_Pretrain/mlp2x_gelu_clip_l14_336px/mm_projector.bin
PRETRAIN_VIDEO_MLP_PATH=results/mlp2x_gelu_internvideo2/mm_projector.bin
PRETRAIN_IMAGE_MLP_PATH=results/mlp2x_gelu_clip_l14_336px/mm_projector.bin
OUTPUT_DIR_PATH=results/videogpt_plus_finetune

deepspeed videogpt_plus/train/train.py \
--lora_enable True --lora_r 128 --lora_alpha 256 --mm_projector_lr 2e-5 \
--deepspeed scripts/zero3.json \
--model_name_or_path "$BASE_LLM_PATH" \
--version phi3_instruct \
--dataset_use FINETUNING \
--vision_tower "$VISION_TOWER" \
--image_vision_tower "$IMAGE_VISION_TOWER" \
--mm_projector_type "$PROJECTOR_TYPE" \
--image_mm_projector_type "$PROJECTOR_TYPE" \
--pretrain_mm_mlp_adapter "$PRETRAIN_VIDEO_MLP_PATH" \
--pretrain_image_mm_mlp_adapter "$PRETRAIN_IMAGE_MLP_PATH" \
--mm_vision_select_layer -2 \
--mm_use_im_start_end False \
--mm_use_im_patch_token False \
--image_aspect_ratio pad \
--group_by_modality_length True \
--bf16 True \
--output_dir $OUTPUT_DIR_PATH \
--num_train_epochs 1 \
--per_device_train_batch_size 24 \
--per_device_eval_batch_size 16 \
--gradient_accumulation_steps 2 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 50000 \
--save_total_limit 1 \
--learning_rate 2e-4 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--tf32 True \
--model_max_length 4096 \
--gradient_checkpointing True \
--dataloader_num_workers 16 \
--lazy_preprocess True \
--report_to none

@rixejzvdl649
Copy link
Author

In the official example, the two benchmarks each have their own weights

VideoGPT-plus/MBZUAI/VideoGPT-plus_Phi3-mini-4k/mvbench
VideoGPT-plus/MBZUAI/VideoGPT-plus_Phi3-mini-4k/vcgbench

@rixejzvdl649
Copy link
Author

step1
pretrain_projector_image_encoder.sh
step2
pretrain_projector_video_encoder.sh
step3
finetune_dual_encoder.sh
step4
eval/vcgbench/inference/run_ddp_inference.sh
step5
eval/vcgbench/gpt_evaluation/vcgbench_evaluate.sh

So besides the above setp123. and step45, is there any other information or steps I missed?

@rixejzvdl649
Copy link
Author

from .dataset_config import *

DataConfig = {
    "PRETRAINING": [CC3M_595K, COCO_CAP, COCO_REG, COCO_REC],

    "FINETUNING": [CONV_VideoChatGPT, VCG_HUMAN, VCG_PLUS_112K, CAPTION_VIDEOCHAT, CLASSIFICATION_K710, CLASSIFICATION_SSV2, CONV_VideoChat1, REASONING_NExTQA, REASONING_CLEVRER_QA, REASONING_CLEVRER_MC, VQA_WEBVID_QA],

    "VCGBench_FINETUNING": [CONV_VideoChatGPT, VCG_HUMAN, VCG_PLUS_112K, CAPTION_VIDEOCHAT, CONV_VideoChat1, VQA_WEBVID_QA],
    "MVBench_FINETUNING": [CLASSIFICATION_K710, CLASSIFICATION_SSV2, CONV_VideoChatGPT, REASONING_NExTQA, REASONING_CLEVRER_QA, REASONING_CLEVRER_MC, VQA_WEBVID_QA],

}

@rixejzvdl649
Copy link
Author

I didn't use VCGBench_FINETUNING and MVBench_FINETUNING. Will there be any problems?

@mmaaz60
Copy link
Member

mmaaz60 commented Jun 28, 2024

Hi @rixejzvdl649,

Thank you for your interest in our work and providing the detailed information about your question. The steps you mentioned to reproduce our results (pretraining + finetuning + evaluation) are correct.

However, please note that we finetune two models/variants of VideoGPT+. The first variant finetuned using VCGBench_FINETUNING is used to evaluate on VCGBench and VCGBench-Diverse, and the second variant finetuned on MVBench_FINETUNING is used to evaluate on MVBench.

I hope it will help. Please let me know if you have any questions.

@qianwangn
Copy link

@mmaaz60 hello, does stage1 and stage2 can parallel?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants