In what order should I reproduce the paper? #12

rixejzvdl649 · 2024-06-27T01:48:10Z

step1
pretrain_projector_image_encoder.sh
step2
pretrain_projector_video_encoder.sh
step3
finetune_dual_encoder.sh
step4
eval/vcgbench/inference/run_ddp_inference.sh
step5
eval/vcgbench/gpt_evaluation/vcgbench_evaluate.sh

#!/bin/sh


export DATASET_DIR=/mnt2/ninghuayang/data/videogpt_plus_dataset

BASE_LLM_PATH=microsoft/Phi-3-mini-4k-instruct
VISION_TOWER=OpenGVLab/InternVideo2-Stage2_1B-224p-f4
IMAGE_VISION_TOWER=openai/clip-vit-large-patch14-336
PROJECTOR_TYPE=mlp2x_gelu
#PRETRAIN_VIDEO_MLP_PATH=MBZUAI/VideoGPT-plus_Phi3-mini-4k_Pretrain/mlp2x_gelu_internvideo2/mm_projector.bin
#PRETRAIN_IMAGE_MLP_PATH=MBZUAI/VideoGPT-plus_Phi3-mini-4k_Pretrain/mlp2x_gelu_clip_l14_336px/mm_projector.bin
PRETRAIN_VIDEO_MLP_PATH=results/mlp2x_gelu_internvideo2/mm_projector.bin
PRETRAIN_IMAGE_MLP_PATH=results/mlp2x_gelu_clip_l14_336px/mm_projector.bin
OUTPUT_DIR_PATH=results/videogpt_plus_finetune

deepspeed videogpt_plus/train/train.py \
--lora_enable True --lora_r 128 --lora_alpha 256 --mm_projector_lr 2e-5 \
--deepspeed scripts/zero3.json \
--model_name_or_path "$BASE_LLM_PATH" \
--version phi3_instruct \
--dataset_use FINETUNING \
--vision_tower "$VISION_TOWER" \
--image_vision_tower "$IMAGE_VISION_TOWER" \
--mm_projector_type "$PROJECTOR_TYPE" \
--image_mm_projector_type "$PROJECTOR_TYPE" \
--pretrain_mm_mlp_adapter "$PRETRAIN_VIDEO_MLP_PATH" \
--pretrain_image_mm_mlp_adapter "$PRETRAIN_IMAGE_MLP_PATH" \
--mm_vision_select_layer -2 \
--mm_use_im_start_end False \
--mm_use_im_patch_token False \
--image_aspect_ratio pad \
--group_by_modality_length True \
--bf16 True \
--output_dir $OUTPUT_DIR_PATH \
--num_train_epochs 1 \
--per_device_train_batch_size 24 \
--per_device_eval_batch_size 16 \
--gradient_accumulation_steps 2 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 50000 \
--save_total_limit 1 \
--learning_rate 2e-4 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--tf32 True \
--model_max_length 4096 \
--gradient_checkpointing True \
--dataloader_num_workers 16 \
--lazy_preprocess True \
--report_to none

The text was updated successfully, but these errors were encountered:

rixejzvdl649 · 2024-06-27T01:49:27Z

In the official example, the two benchmarks each have their own weights

VideoGPT-plus/MBZUAI/VideoGPT-plus_Phi3-mini-4k/mvbench
VideoGPT-plus/MBZUAI/VideoGPT-plus_Phi3-mini-4k/vcgbench

rixejzvdl649 · 2024-06-27T01:50:24Z

step1
pretrain_projector_image_encoder.sh
step2
pretrain_projector_video_encoder.sh
step3
finetune_dual_encoder.sh
step4
eval/vcgbench/inference/run_ddp_inference.sh
step5
eval/vcgbench/gpt_evaluation/vcgbench_evaluate.sh

So besides the above setp123. and step45, is there any other information or steps I missed?

rixejzvdl649 · 2024-06-27T01:51:21Z

from .dataset_config import *

DataConfig = {
    "PRETRAINING": [CC3M_595K, COCO_CAP, COCO_REG, COCO_REC],

    "FINETUNING": [CONV_VideoChatGPT, VCG_HUMAN, VCG_PLUS_112K, CAPTION_VIDEOCHAT, CLASSIFICATION_K710, CLASSIFICATION_SSV2, CONV_VideoChat1, REASONING_NExTQA, REASONING_CLEVRER_QA, REASONING_CLEVRER_MC, VQA_WEBVID_QA],

    "VCGBench_FINETUNING": [CONV_VideoChatGPT, VCG_HUMAN, VCG_PLUS_112K, CAPTION_VIDEOCHAT, CONV_VideoChat1, VQA_WEBVID_QA],
    "MVBench_FINETUNING": [CLASSIFICATION_K710, CLASSIFICATION_SSV2, CONV_VideoChatGPT, REASONING_NExTQA, REASONING_CLEVRER_QA, REASONING_CLEVRER_MC, VQA_WEBVID_QA],

}

rixejzvdl649 · 2024-06-27T01:52:48Z

I didn't use VCGBench_FINETUNING and MVBench_FINETUNING. Will there be any problems?

mmaaz60 · 2024-06-28T20:05:45Z

Hi @rixejzvdl649,

Thank you for your interest in our work and providing the detailed information about your question. The steps you mentioned to reproduce our results (pretraining + finetuning + evaluation) are correct.

However, please note that we finetune two models/variants of VideoGPT+. The first variant finetuned using VCGBench_FINETUNING is used to evaluate on VCGBench and VCGBench-Diverse, and the second variant finetuned on MVBench_FINETUNING is used to evaluate on MVBench.

I hope it will help. Please let me know if you have any questions.

qianwangn · 2024-08-07T08:59:55Z

@mmaaz60 hello, does stage1 and stage2 can parallel?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In what order should I reproduce the paper? #12

In what order should I reproduce the paper? #12

rixejzvdl649 commented Jun 27, 2024

rixejzvdl649 commented Jun 27, 2024

rixejzvdl649 commented Jun 27, 2024

rixejzvdl649 commented Jun 27, 2024

rixejzvdl649 commented Jun 27, 2024

mmaaz60 commented Jun 28, 2024

qianwangn commented Aug 7, 2024

In what order should I reproduce the paper? #12

In what order should I reproduce the paper? #12

Comments

rixejzvdl649 commented Jun 27, 2024

rixejzvdl649 commented Jun 27, 2024

rixejzvdl649 commented Jun 27, 2024

rixejzvdl649 commented Jun 27, 2024

rixejzvdl649 commented Jun 27, 2024

mmaaz60 commented Jun 28, 2024

qianwangn commented Aug 7, 2024