-
Microsoft
Block or Report
Block or report zhegan27
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language
GRiT: A Generative Region-to-text Transformer for Object Understanding (https://arxiv.org/abs/2212.00280)
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
PyTorch implementation of a 1.3B text-to-image generation model trained on 14 million image-text pairs
GIT: A Generative Image-to-text Transformer for Vision and Language
UniTAB: Unifying Text and Box Outputs for Grounded VL Modeling, ECCV 2022 (Oral Presentation)
PyTorch implementation of BEVT (CVPR 2022) https://arxiv.org/abs/2112.01529
[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions
Research code for CVPR 2022 paper "SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning"
Code release for SLIP Self-supervision meets Language-Image Pre-training
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models (ICCV 2023)
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Code release for "Detecting Twenty-thousand Classes using Image-level Supervision".
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
iBOT 🤖: Image BERT Pre-Training with Online Tokenizer (ICLR 2022)
METER: A Multimodal End-to-end TransformER Framework
[TMLR] "Adversarial Feature Augmentation and Normalization for Visual Recognition", Tianlong Chen, Yu Cheng, Zhe Gan, Jianfeng Wang, Lijuan Wang, Zhangyang Wang, Jingjing Liu
PyTorch code for "Unifying Vision-and-Language Tasks via Text Generation" (ICML 2021)
Pytorch version of VidLanKD: Improving Language Understanding viaVideo-Distilled Knowledge Transfer (NeurIPS 2021))
A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''.
Code for ALBEF: a new vision-language pre-training method
Prompt Learning for Vision-Language Models (IJCV'22, CVPR'22)
[NeurIPS 2021 Spotlight] Official code for "Focal Self-attention for Local-Global Interactions in Vision Transformers"