Block or Report
Block or report fyting
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
A curated list for Efficient Large Language Models
(ECCV 2024) Empowering Multimodal Large Language Model as a Powerful Data Generator
[ECCV 2024] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning
[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
[CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
GPT4V-level open-source multi-modal model based on Llama3-8B
ms-swift: Use PEFT or Full-parameter to finetune 300+ LLMs or 50+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的可商用开源多模态对话模型
A flexible and efficient codebase for training visually-conditioned language models (VLMs)
An Open-source Toolkit for LLM Development
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
A Framework of Small-scale Large Multimodal Models
PyTorch implementation for the paper "Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving"
A collection of deep learning based RGB-T-Fusion methods, codes, and datasets. The main directions involved are Multispectral Pedestrian Detection, RGB-T Aerial Object Detection, RGB-T Semantic Seg…
Official Code for Paper "GDRNet: Towards Generalizable Diabetic Retinopathy Grading in Unseen Domains"
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Official implementation of the CVPR 2024 paper ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions.
CVPR 2023 论文和开源项目合集