Skip to content
View fyting's full-sized avatar
🎯
Focusing
🎯
Focusing
  • Beihang University

Block or report fyting

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
304 results for source starred repositories
Clear filter

Bring portraits to life!

Python 12,220 1,288 Updated Oct 7, 2024

[NeurIPS 2024] Matryoshka Query Transformer for Large Vision-Language Models

Python 93 11 Updated Jul 1, 2024

The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".

Python 166 8 Updated Sep 16, 2024

A paper list of some recent works about Token Compress for Vit and VLM

65 Updated Oct 9, 2024

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Python 5,139 417 Updated Oct 2, 2024

Official implementation of "LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference"

Python 68 3 Updated Sep 19, 2024

Effortless data labeling with AI support from Segment Anything and other awesome models.

Python 3,863 446 Updated Oct 2, 2024

MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer

Python 194 11 Updated Apr 3, 2024

✨✨ MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

Python 73 5 Updated Sep 29, 2024

EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Python 510 43 Updated Sep 19, 2024

【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Python 1,788 125 Updated Sep 26, 2024

[ECCV2024 Oral🔥] Official Implementation of "GiT: Towards Generalist Vision Transformer through Universal Language Interface"

Python 297 13 Updated Oct 7, 2024

LAVIS - A One-stop Library for Language-Vision Intelligence

Jupyter Notebook 9,746 954 Updated Aug 23, 2024
Python 341 13 Updated Jul 29, 2024

DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

Python 114 1 Updated Sep 27, 2024
Python 15 Updated Aug 9, 2024

Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"

Jupyter Notebook 106 2 Updated Sep 21, 2024

[NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models

Python 211 3 Updated Oct 2, 2024

Exploration of the multi modal fuyu-8b model of Adept. 🤓 🔍

Jupyter Notebook 28 4 Updated Nov 7, 2023

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

Python 3,561 241 Updated Mar 5, 2024

Diffusion Feedback Helps CLIP See Better

Python 206 11 Updated Aug 24, 2024

OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]

Python 1,249 47 Updated Oct 2, 2024

A curated list for Efficient Large Language Models

Python 1,164 85 Updated Oct 8, 2024

(ECCV 2024) Empowering Multimodal Large Language Model as a Powerful Data Generator

Python 77 Updated Oct 8, 2024

[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Python 241 9 Updated Aug 12, 2024

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

Python 545 57 Updated Jun 7, 2024

Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning

Python 9 1 Updated Mar 6, 2024

[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"

Python 631 31 Updated Aug 13, 2024

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

Python 859 121 Updated Apr 12, 2024

VisionLLM Series

Python 871 23 Updated Sep 13, 2024
Next