Skip to content
View forrestbing's full-sized avatar
  • Hangzhou, Zhejiang
Block or Report

Block or report forrestbing

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".

Python 110 4 Updated Jul 26, 2024

EVE: Encoder-Free Vision-Language Models from BAAI

Python 176 3 Updated Jul 20, 2024

[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Python 2,035 123 Updated Jun 25, 2024

Odyssey: Empowering Agents with Open-World Skills

Python 154 5 Updated Jul 29, 2024

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

Python 3,527 242 Updated Mar 5, 2024

a state-of-the-art-level open visual language model | 多模态预训练模型

Python 5,716 393 Updated May 29, 2024

LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models

Python 28 2 Updated Jul 11, 2024

PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding. PixelLM is accepted by CVPR 2024.

Python 157 4 Updated Jun 3, 2024

Analysis of Chinese and English layouts 中英文版面分析

Python 56 6 Updated Jul 19, 2024

Dataset and Code for our ACL 2024 paper: "Multimodal Table Understanding". We propose the first large-scale Multimodal IFT and Pre-Train Dataset for table understanding and develop a generalist tab…

Python 73 2 Updated Jul 19, 2024

Align Anything: Training Any Modality Model with Feedback

Python 64 17 Updated Jul 28, 2024

A family of compressed models obtained via pruning and knowledge distillation

70 5 Updated Jul 26, 2024

Prompt engineering, automated.

Jupyter Notebook 116 7 Updated Jul 28, 2024

AI agent using GPT-4V(ision) capable of using a mouse/keyboard to interact with web UI

JavaScript 926 85 Updated Jan 31, 2024

Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).

Python 123 2 Updated Jul 19, 2024

Awesome OVD-OVS - A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future

74 3 Updated Jul 24, 2024

4M: Massively Multimodal Masked Modeling

Python 1,448 83 Updated Jul 17, 2024

A minimal codebase for finetuning large multimodal models, supporting llava-1.5, qwen-vl, llava-interleave, llava-next-video, phi3-v etc.

Python 75 2 Updated Jul 28, 2024
69 Updated Mar 7, 2024

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,606 99 Updated Jul 26, 2024

An Open-source Toolkit for LLM Development

Python 2,642 168 Updated May 24, 2024

Official code for Paper "Mantis: Multi-Image Instruction Tuning"

Python 127 9 Updated Jul 25, 2024

Text-Guided Generation of Full-Body Image with Preserved Reference Face for Customized Animation

Python 22 3 Updated Jun 24, 2024
Jupyter Notebook 29 2 Updated Jul 15, 2024
Python 1,396 130 Updated Jul 29, 2024
Python 544 25 Updated Feb 15, 2024

[CVPR 2024] Code release for "Unsupervised Universal Image Segmentation"

Python 160 5 Updated May 7, 2024

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Python 1,907 180 Updated Apr 24, 2024

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 4,447 335 Updated May 28, 2024

GPT4V-level open-source multi-modal model based on Llama3-8B

Python 1,670 90 Updated Jul 25, 2024
Next