Skip to content

Latest commit

 

History

History
172 lines (151 loc) · 5.58 KB

tutorial.configs.rst

File metadata and controls

172 lines (151 loc) · 5.58 KB

Training Models on Task Datasets (Commands and Configurations)

LAVIS provides scripts to pre-train and finetune supported models on standard language-vision tasks, stored at lavis/run_scripts/. To replicate the experiments, just run these bash scripts. For example, to train BLIP model on the image-text retrieval task with MSCOCO dataset, we can run

bash run_scripts/blip/train/train_retrieval_coco.sh

Inside the scripts, we can see

python -m torch.distributed.run --nproc_per_node=8 train.py --cfg-path lavis/projects/blip/train/retrieval_coco_ft.yaml

where we start a pytorch distributed training on 8 GPUs (you may change according to your own hardware setup). The --cfg-path specifys a runtime configuration file, specifying the task, model, dataset and training recipes.

Available options and their descriptions are as below.

Model Configurations Functionalities
arch
name of the model from the model zoo
default: task-dependent
model_type
the type of the model (e.g., base)
default: task-dependent
load_pretrained
load pretrained weights
default: True (for finetuning task) | False (for pretraining task)
load_finetuned
load task-specific finetuned weights
default: False (for finetuning task) | True (for evaluation)
pretrained
URL or local path which stores the pretrained model, defined in the default model configuration file
default: task-dependent
finetuned
URL or local path which stores the finetuned model, defined in the default model configuration file
default: task-dependent
Dataset Configurations Functionalities
vis_processor
pre-processing of visual input
default: task-dependent
text_processor
pre-processing of text input
default: task-dependent
build_info
dataset information including the storage location, defined in the default dataset configuration file
default: task-dependent
Runtime Configurations Functionalities
task
name of the task
default: task-dependent
lr_sched
learning rate schedular
default: linear_warmup_cosine_lr
init_lr
initial learning rate (after warmup)
default: task-dependent
min_lr
final learning rate after decay
default: task-dependent
warmup_lr
starting learning rate for warmup
default: init_lr (no warmup)
lr_decay_rate
learning rate decay per epoch for step_lr_shedule
default: 0.9
warmup_steps
number of steps for learning rate warmup
default: 0
max_epoch
total number of training epochs
default: task-dependent
weight_decay
weight decay coefficient for the optimizer
default: 0.05
batch_size_train
batch size during training
default: task-dependent
batch_size_eval
batch size during evaluation
default: task-dependent
seed
pseudo random number generator seed
default: 42
output_dir
directory to store logs, results and checkpoints
default: task-dependent
resume_ckpt_path
path of the checkpoint to resume training from
default: None
evaluate
only perform evaluation without training
default: False
train_splits
dataset splits used for training
default: ["train"]
valid_splits
dataset splits used for validation
default: ["val"]
test
dataset splits used for test
default: ["test"]
device
use cpu or gpu (cuda)
default: cuda
world_size
number of processes participating in the job
default: 1
dist_url
URL specifying how to initialize the process group
default: "env:https://"
distributed
use distributed training
default: True
amp
use automatic mixed precision training
default: False
Text Generation Configurations Functionalities
max_len
maximum number of text tokens to generate
default: 20 (for image captioning)
min_len
minimum number of text tokens to generate
default: 5 (for image captioning)
num_beams
number of beams to perform beam search
default: 3
Multimodal Retrieval Configurations Functionalities
negative_all_rank
collect negatives from all processes for the image-text matching loss
default: True (for coco)
k_test
number of retrieval candidates ranked from contrastive similarity
default: 256 (for coco)