vision-language-transformer

Here are 9 public repositories matching this topic...

atharva-naik / MMML-TermProject-VizWiz-VQA-Challenge

VizWiz Challenge Term Project for Multi Modal Machine Learning @ CMU (11777)

open-source opencv natural-language-processing computer-vision image-processing pytorch question-answering open-source-project carnegie-mellon-university term-project visual-question-answering vizwiz vision-language vision-language-transformer vizwiz-vqa

Updated Sep 13, 2023
Python

unitaryai / VTC

Star

VTC: Improving Video-Text Retrieval with User Comments

comments video-understanding multimodal-deep-learning video-text-retrieval vision-language-transformer vision-language-pretraining

Updated Jun 18, 2024
Python

yiren-jian / BLIText

Star

[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training

multimodal-deep-learning vision-language-transformer vision-language-pretraining

Updated Dec 5, 2023
Python

lhao499 / instructrl

Star

Instruction Following Agents with Multimodal Transforemrs

machine-learning reinforcement-learning instructions transformer flax jax instruction-following vision-language-transformer

Updated Nov 3, 2022
Python

sdc17 / UPop

Star

[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.

framework sparsity image-captioning pruning structured model-compression visual-reasoning multimodal-learning visual-question-answering weight-pruning efficient-deep-learning vision-transformer vision-language-transformer image-text-retrieval text-image-retrieval

Updated Nov 4, 2023
Python

henghuiding / Vision-Language-Transformer

Star

[ICCV2021 & TPAMI2023] Vision-Language Transformer and Query Generation for Referring Segmentation

tensorflow keras transformer vision-language referring-segmentation tpami iccv2021 vision-language-transformer

Updated Jan 7, 2022
Python

shenyunhang / APE

Star

[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception

open-world object-detection image-segmentation referring-expression-comprehension vision-language-transformer

Updated May 8, 2024
Python

henghuiding / ReLA

Star

[CVPR2023 Highlight] GRES: Generalized Referring Expression Segmentation

multimodal-learning referring-image-segmentation referring-expression-segmentation referring-expression-comprehension vision-language-transformer cvpr2023

Updated Sep 5, 2023
Python

IDEA-Research / GroundingDINO

Star

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

open-world object-detection vision-language vision-language-transformer open-world-detection

Updated Jun 28, 2024
Python

Improve this page

Add a description, image, and links to the vision-language-transformer topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vision-language-transformer topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vision-language-transformer

Here are 9 public repositories matching this topic...

atharva-naik / MMML-TermProject-VizWiz-VQA-Challenge

unitaryai / VTC

yiren-jian / BLIText

lhao499 / instructrl

sdc17 / UPop

henghuiding / Vision-Language-Transformer

shenyunhang / APE

henghuiding / ReLA

IDEA-Research / GroundingDINO

Improve this page

Add this topic to your repo