fyting

Follow

🎯

Focusing

wangxinliang fyting

🎯

Focusing

Follow

😃

22 followers · 40 following

Beihang University

Stars

304 results for source starred repositories

KwaiVGI / LivePortrait

Bring portraits to life!

Python 12,220 1,288 Updated Oct 7, 2024

gordonhu608 / MQT-LLaVA

[NeurIPS 2024] Matryoshka Query Transformer for Large Vision-Language Models

Python 93 11 Updated Jul 1, 2024

CircleRadon / TokenPacker

The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".

Python 166 8 Updated Sep 16, 2024

daixiangzi / Awesome-Token-Compress

A paper list of some recent works about Token Compress for Vit and VLM

65 Updated Oct 9, 2024

Ucas-HaoranWei / GOT-OCR2.0

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Python 5,139 417 Updated Oct 2, 2024

SUSTechBruce / LOOK-M

Official implementation of "LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference"

Python 68 3 Updated Sep 19, 2024

CVHub520 / X-AnyLabeling

Effortless data labeling with AI support from Segment Anything and other awesome models.

Python 3,863 446 Updated Oct 2, 2024

OpenGVLab / MM-Interleaved

MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer

Python 194 11 Updated Apr 3, 2024

yfzhang114 / MME-RealWorld

✨✨ MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

Python 73 5 Updated Sep 29, 2024

NVlabs / EAGLE

EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Python 510 43 Updated Sep 19, 2024

Yuliang-Liu / Monkey

【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Python 1,788 125 Updated Sep 26, 2024

Haiyang-W / GiT

[ECCV2024 Oral🔥] Official Implementation of "GiT: Towards Generalist Vision Transformer through Universal Language Interface"

Python 297 13 Updated Oct 7, 2024

salesforce / LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

Jupyter Notebook 9,746 954 Updated Aug 23, 2024

UX-Decoder / LLaVA-Grounding

Python 341 13 Updated Jul 29, 2024

baaivision / DenseFusion

DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

Python 114 1 Updated Sep 27, 2024

ModelTC / OmniBal

Python 15 Updated Aug 9, 2024

Yangyi-Chen / SOLO

Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"

Jupyter Notebook 106 2 Updated Sep 21, 2024

baaivision / EVE

[NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models

Python 211 3 Updated Oct 2, 2024

nengelmann / Fuyu-8B---Exploration

Exploration of the multi modal fuyu-8b model of Adept. 🤓 🔍

Jupyter Notebook 28 4 Updated Nov 7, 2023

Luodian / Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

Python 3,561 241 Updated Mar 5, 2024

baaivision / DIVA

Diffusion Feedback Helps CLIP See Better

Python 206 11 Updated Aug 24, 2024

lxtGH / OMG-Seg

OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]

Python 1,249 47 Updated Oct 2, 2024

horseee / Awesome-Efficient-LLM

A curated list for Efficient Large Language Models

Python 1,164 85 Updated Oct 8, 2024

zhaohengyuan1 / Genixer

(ECCV 2024) Empowering Multimodal Large Language Model as a Powerful Data Generator

Python 77 Updated Oct 8, 2024

pkunlp-icler / FastV

[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Python 241 9 Updated Aug 12, 2024

FoundationVision / Groma

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

Python 545 57 Updated Jun 7, 2024

adxcreative / EERCF

Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning

Python 9 1 Updated Mar 6, 2024

beichenzbc / Long-CLIP

[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"

Python 631 31 Updated Aug 13, 2024

ArrowLuo / CLIP4Clip

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

Python 859 121 Updated Apr 12, 2024

OpenGVLab / VisionLLM

VisionLLM Series

Python 871 23 Updated Sep 13, 2024