Skip to content
View yingsen1's full-sized avatar
  • Shenzhen

Block or report yingsen1

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 367 38 Updated May 1, 2024

MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.

Python 1,144 63 Updated Nov 7, 2024
Python 54 2 Updated Jun 27, 2024

Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)

Python 58 1 Updated Jul 1, 2024
Python 72 Updated Dec 13, 2023

Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges

Python 49 Updated Sep 19, 2024

Examples and guides for using the OpenAI API

MDX 59,759 9,517 Updated Nov 13, 2024

A family of lightweight multimodal models.

Python 930 69 Updated Oct 21, 2024

VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)

Python 224 27 Updated Aug 15, 2024
Python 13 1 Updated Sep 13, 2024

[Preprint] VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding

Python 65 1 Updated Oct 10, 2024

[ICCV2023] UniVTG: Towards Unified Video-Language Temporal Grounding

Python 321 29 Updated May 8, 2024

official impelmentation of Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input

Python 54 Updated Aug 30, 2024

Inference of InternVL model on V100

Python 5 Updated May 11, 2024

Official Implementation of "The Surprising Effectiveness of Multimodal Large Language Models for Video Moment Retrieval"

Python 46 1 Updated Nov 1, 2024

[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".

Python 225 11 Updated Jun 13, 2024

Official implementation of OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion

Python 245 13 Updated Sep 15, 2024
Python 2,855 235 Updated Oct 16, 2024

GPT4V-level open-source multi-modal model based on Llama3-8B

Python 2,113 145 Updated Sep 3, 2024

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 5,995 465 Updated Oct 29, 2024

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Python 2,799 259 Updated Jun 4, 2024

Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding

Python 553 60 Updated Oct 4, 2024

Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

Python 33,911 2,572 Updated Nov 13, 2024

[ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"

Python 191 9 Updated Sep 3, 2024

Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-V…

Python 4,209 370 Updated Nov 13, 2024

[ACL 2024] GroundingGPT: Language-Enhanced Multi-modal Grounding Model

Python 302 16 Updated Nov 4, 2024

This is an official implementation of our CVPR 2020 paper "HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation" (https://arxiv.org/abs/1908.10357)

Python 1,339 272 Updated Apr 12, 2021

[CVPR 2024] Real-Time Open-Vocabulary Object Detection

Python 4,656 452 Updated Nov 5, 2024

UniMD: Towards Unifying Moment retrieval and temporal action Detection

Python 37 1 Updated Jul 5, 2024

[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding

Python 839 60 Updated Jul 6, 2024
Next