Skip to content
View vasgaowei's full-sized avatar
Block or Report

Block or report vasgaowei

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A collection of visual instruction tuning datasets.

Python 73 3 Updated Mar 14, 2024

DataComp for Language Models

HTML 724 59 Updated Jul 24, 2024

MobiLlama : Small Language Model tailored for edge devices

Python 567 41 Updated Mar 3, 2024

CoreNet: A library for training deep neural networks

Python 6,826 528 Updated May 28, 2024

Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model

Python 225 7 Updated Jun 25, 2024

The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".

Python 105 4 Updated Jul 9, 2024

Official implementation of project Honeybee (CVPR 2024)

Python 399 18 Updated May 10, 2024

MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment

Python 26 Updated Jul 1, 2024

SEED-Story: Multimodal Long Story Generation with Large Language Model

Python 578 42 Updated Jul 24, 2024

Efficient Multi-modal Models via Stage-wise Visual Context Compression

Python 26 2 Updated Jul 3, 2024

When do we not need larger vision models?

Python 273 8 Updated Jul 12, 2024

DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

Python 74 1 Updated Jul 12, 2024

IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model

Python 17 Updated Jul 12, 2024

Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation

Python 542 28 Updated Jul 15, 2024

LaVIT: Empower the Large Language Model to Understand and Generate Visual Content

Jupyter Notebook 455 22 Updated Jul 1, 2024

Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"

Python 667 50 Updated Jul 9, 2024

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Python 1,585 95 Updated Jul 10, 2024

Codes for "Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models".

Jupyter Notebook 1,044 86 Updated Dec 23, 2023

Codebase for the WayveScenes101 Dataset

Python 135 4 Updated Jul 16, 2024

[ECCV 2024] Accelerating Online Mapping and Behavior Prediction via Direct BEV Feature Attention

18 1 Updated Jul 10, 2024
Python 1,351 73 Updated Jul 22, 2024

Learning 1D Causal Visual Representation with De-focus Attention Networks

Python 24 Updated Jun 7, 2024

Official implementation of SEED-LLaMA (ICLR 2024).

Python 530 30 Updated Apr 11, 2024

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

Python 273 14 Updated Apr 14, 2024

[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

Python 197 6 Updated May 28, 2024

[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列

Python 1,038 92 Updated Jun 13, 2024

🔥 [ECCV2024] Official Implementation of "GiT: Towards Generalist Vision Transformer through Universal Language Interface"

Python 256 12 Updated Jul 23, 2024

Multimodal Video Understanding Framework (MVU)

Python 21 Updated May 15, 2024

[ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"

Python 210 6 Updated Jan 17, 2024
Next