Skip to content
View zhegan27's full-sized avatar
Block or Report

Block or report zhegan27

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language

Python 1,276 130 Updated Oct 5, 2023

GRiT: A Generative Region-to-text Transformer for Object Understanding (https://arxiv.org/abs/2212.00280)

Python 290 30 Updated Jan 8, 2024
Jupyter Notebook 82 11 Updated Dec 4, 2022

Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone

Python 127 11 Updated Oct 10, 2023
Python 75 7 Updated Jan 23, 2023

PyTorch implementation of a 1.3B text-to-image generation model trained on 14 million image-text pairs

Python 628 66 Updated Aug 9, 2022

GIT: A Generative Image-to-text Transformer for Vision and Language

Python 541 67 Updated Dec 2, 2023

UniTAB: Unifying Text and Box Outputs for Grounded VL Modeling, ECCV 2022 (Oral Presentation)

Python 84 7 Updated Jun 12, 2023

PyTorch implementation of BEVT (CVPR 2022) https://arxiv.org/abs/2112.01529

Python 156 19 Updated Jul 19, 2022

[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions

Python 1,194 133 Updated Mar 18, 2024

Research code for CVPR 2022 paper "SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning"

Python 235 34 Updated May 26, 2022

Code release for SLIP Self-supervision meets Language-Image Pre-training

Python 735 67 Updated Feb 9, 2023

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models (ICCV 2023)

Jupyter Notebook 136 6 Updated Nov 27, 2023

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Python 2,379 247 Updated Apr 24, 2024

Code release for "Detecting Twenty-thousand Classes using Image-level Supervision".

Python 1,837 207 Updated Mar 21, 2024

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Jupyter Notebook 4,542 610 Updated Aug 5, 2024

A PyTorch implementation of VIOLET

Python 136 6 Updated Dec 17, 2023

iBOT 🤖: Image BERT Pre-Training with Online Tokenizer (ICLR 2022)

Jupyter Notebook 651 76 Updated Apr 14, 2022

Grounded Language-Image Pre-training

Python 2,110 187 Updated Jan 24, 2024

MLPs for Vision and Langauge Modeling (Coming Soon)

27 Updated Dec 9, 2021

METER: A Multimodal End-to-end TransformER Framework

Python 357 30 Updated Nov 16, 2022

A full-fledged version of Pix2Seq

Python 235 20 Updated Nov 6, 2021

[TMLR] "Adversarial Feature Augmentation and Normalization for Visual Recognition", Tianlong Chen, Yu Cheng, Zhe Gan, Jianfeng Wang, Lijuan Wang, Zhangyang Wang, Jingjing Liu

Python 20 2 Updated Nov 27, 2022

PyTorch code for "Unifying Vision-and-Language Tasks via Text Generation" (ICML 2021)

Python 356 57 Updated Jul 29, 2023

Pytorch version of VidLanKD: Improving Language Understanding viaVideo-Distilled Knowledge Transfer (NeurIPS 2021))

Python 56 8 Updated Feb 6, 2023

A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''.

Python 910 111 Updated Oct 6, 2022

Code for ALBEF: a new vision-language pre-training method

Python 1,470 191 Updated Sep 20, 2022

Prompt Learning for Vision-Language Models (IJCV'22, CVPR'22)

Python 1,629 190 Updated May 20, 2024
Python 126 15 Updated Aug 18, 2022

[NeurIPS 2021 Spotlight] Official code for "Focal Self-attention for Local-Global Interactions in Vision Transformers"

Python 543 59 Updated Mar 27, 2022
Next