Skip to content

Enhancement in Multimodal Representation Learning.

License

BSD-3-Clause and 2 other licenses found

Licenses found

BSD-3-Clause
LICENSE
BSD-3-Clause
LICENSE.md
BSD-3-Clause
LICENSE_LAVIS.md
Notifications You must be signed in to change notification settings

superagi/Veagle

Veagle: Advancement in Multimodal representation Learning

Rajat Chawla*, Arkajit Datta*, Tushar Verma, Adarsh Jha, Anmol Gautam, Ayush Vatsal, Sukrit Chatterjee, Mukunda NS and Ishaan Bhola. *Equal Contribution

SuperAGI


Model Architecture.

Release

  • [1/18] 🔥 We released the training code of Veagle.
  • [1/18] 🔥 We released Veagle: Advancement in Multimodal representation Learning.

Installation

  1. Clone the repository
git clone https://github.com/superagi/Veagle
cd Veagle
  1. Run installation script
source venv/bin/activate
chmod +x install.sh
./install.sh

Inference

python evaluate.py --answer_qs \
        --model_name veagle_mistral \
        --img_path images/food.jpeg \
        --question "Is the food given in the image is healthy or not?"
python evaluate.py --answer_qs \
        --model_name veagle_mistral \
        --img_path images/dog.jpeg \
        --question "Write a poem that rhymes very well based on the above image."
python evaluate.py --answer_qs \
        --model_name veagle_mistral \
        --img_path images/astronaut.jpeg \
        --question "What is the significance of this moment in history?"

Train

After downloading the training datasets and specify their path in dataset configs, we are ready for training. We utilized 8x A100 SXM in our experiments. Please adjust hyperparamters according to your GPU resources in train config file. It may take transformers around 2 minutes to load the model, give some time for the model to start training. Make sure you have completed the installation procedure before you start training. Here we give an example of traning Veagle.

  1. Pretraining of Veagle's visual assistant branch
torchrun --nnodes=1 --nproc_per_node=8 \
    train.py \
    --cfg-path train_configs/pretrain_veagle_mistral.yaml
  1. Instruction Finetuning Veagle

You can run Finetuning after you have completed pretraining. Make sure to provide the pretrained model's path in the finetuning config.

torchrun --nnodes=1 --nproc_per_node=8 \
    train.py \
    --cfg-path train_configs/finetune_veagle_mistral.yaml

Acknowledgement

  • BLIP2 The model architecture of BLIVA follows BLIP-2. Don't forget to check this great open-source work if you don't know it before.
  • BLIVA The code base we took inspiration from.
  • mPLUG-Owl2 The code base we took inspiration from.

License

This repository's code is under BSD 3-Clause License. Many codes are based on BLIVA and mPLUG-Owl2 with BSD 3-Clause License here.

Code License

About

Enhancement in Multimodal Representation Learning.

Resources

License

BSD-3-Clause and 2 other licenses found

Licenses found

BSD-3-Clause
LICENSE
BSD-3-Clause
LICENSE.md
BSD-3-Clause
LICENSE_LAVIS.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published