Veagle: Advancement in Multimodal representation Learning

Rajat Chawla*, Arkajit Datta*, Tushar Verma, Adarsh Jha, Anmol Gautam, Ayush Vatsal, Sukrit Chatterjee, Mukunda NS and Ishaan Bhola. *Equal Contribution

SuperAGI

Model Architecture.

Release

[1/18] 🔥 We released the training code of Veagle.
[1/18] 🔥 We released Veagle: Advancement in Multimodal representation Learning.

Installation

Clone the repository

git clone https://github.com/superagi/Veagle
cd Veagle

Run installation script

source venv/bin/activate
chmod +x install.sh
./install.sh

Inference

python evaluate.py --answer_qs \
        --model_name veagle_mistral \
        --img_path images/food.jpeg \
        --question "Is the food given in the image is healthy or not?"

python evaluate.py --answer_qs \
        --model_name veagle_mistral \
        --img_path images/dog.jpeg \
        --question "Write a poem that rhymes very well based on the above image."

python evaluate.py --answer_qs \
        --model_name veagle_mistral \
        --img_path images/astronaut.jpeg \
        --question "What is the significance of this moment in history?"

Train

After downloading the training datasets and specify their path in dataset configs, we are ready for training. We utilized 8x A100 SXM in our experiments. Please adjust hyperparamters according to your GPU resources in train config file. It may take transformers around 2 minutes to load the model, give some time for the model to start training. Make sure you have completed the installation procedure before you start training. Here we give an example of traning Veagle.

Pretraining of Veagle's visual assistant branch

torchrun --nnodes=1 --nproc_per_node=8 \
    train.py \
    --cfg-path train_configs/pretrain_veagle_mistral.yaml

Instruction Finetuning Veagle

You can run Finetuning after you have completed pretraining. Make sure to provide the pretrained model's path in the finetuning config.

torchrun --nnodes=1 --nproc_per_node=8 \
    train.py \
    --cfg-path train_configs/finetune_veagle_mistral.yaml

Acknowledgement

BLIP2 The model architecture of BLIVA follows BLIP-2. Don't forget to check this great open-source work if you don't know it before.
BLIVA The code base we took inspiration from.
mPLUG-Owl2 The code base we took inspiration from.

License

This repository's code is under BSD 3-Clause License. Many codes are based on BLIVA and mPLUG-Owl2 with BSD 3-Clause License here.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
docs/images		docs/images
images		images
mPLUG-Owl2		mPLUG-Owl2
models		models
train_configs		train_configs
veagle		veagle
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
LICENSE.md		LICENSE.md
LICENSE_DATA.md		LICENSE_DATA.md
LICENSE_LAVIS.md		LICENSE_LAVIS.md
README.md		README.md
demo.py		demo.py
download_hf.py		download_hf.py
evaluate.py		evaluate.py
install.sh		install.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

Veagle: Advancement in Multimodal representation Learning

Release

Installation

Inference

Train

Acknowledgement

License

About

Licenses found

Releases

Packages

Languages

License

Licenses found

superagi/Veagle

Folders and files

Latest commit

History

Repository files navigation

Veagle: Advancement in Multimodal representation Learning

Release

Installation

Inference

Train

Acknowledgement

License

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages