-
Alibaba Group
- Beijing, Chaoyang
- https://scholar.google.com/citations?hl=en&user=VfovrnEAAAAJ
Block or Report
Block or report vasgaowei
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
A collection of visual instruction tuning datasets.
MobiLlama : Small Language Model tailored for edge devices
CoreNet: A library for training deep neural networks
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".
Official implementation of project Honeybee (CVPR 2024)
MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
SEED-Story: Multimodal Long Story Generation with Large Language Model
Efficient Multi-modal Models via Stage-wise Visual Context Compression
When do we not need larger vision models?
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
Codes for "Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models".
[ECCV 2024] Accelerating Online Mapping and Behavior Prediction via Direct BEV Feature Attention
Learning 1D Causal Visual Representation with De-focus Attention Networks
Official implementation of SEED-LLaMA (ICLR 2024).
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列
🔥 [ECCV2024] Official Implementation of "GiT: Towards Generalist Vision Transformer through Universal Language Interface"
[ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"