yingsen1

Wilson, Tsang yingsen1

Graduated from BUPT, interested in visual tracking, object detection, video understanding.

5 followers · 13 following

Shenzhen

Stars

zhuyiche / llava-phi

Python 367 38 Updated May 1, 2024

facebookresearch / MobileLLM

MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.

Python 1,144 63 Updated Nov 7, 2024

DCDmllm / Momentor

Python 54 2 Updated Jun 27, 2024

doc-doc / NExT-GQA

Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)

Python 58 1 Updated Jul 1, 2024

egoschema / EgoSchema

Python 72 Updated Dec 13, 2023

bigai-nlco / VideoLLaMB

Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges

Python 49 Updated Sep 19, 2024

openai / openai-cookbook

Examples and guides for using the OpenAI API

MDX 59,759 9,517 Updated Nov 13, 2024

BAAI-DCAI / Bunny

A family of lightweight multimodal models.

Python 930 69 Updated Oct 21, 2024

showlab / videollm-online

VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)

Python 224 27 Updated Aug 15, 2024

minghangz / TFVTG

Python 13 1 Updated Sep 13, 2024

gyxxyg / VTG-LLM

[Preprint] VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding

Python 65 1 Updated Oct 10, 2024

showlab / UniVTG

[ICCV2023] UniVTG: Towards Unified Video-Language Temporal Grounding

Python 321 29 Updated May 8, 2024

KangarooGroup / Kangaroo

official impelmentation of Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input

Python 54 Updated Aug 30, 2024

BIGBALLON / itvl1.5-v100-test

Inference of InternVL model on V100

Python 5 Updated May 11, 2024

sudo-Boris / mr-Blip

Official Implementation of "The Surprising Effectiveness of Multimodal Large Language Models for Video Moment Retrieval"

Python 46 1 Updated Nov 1, 2024

huangb23 / VTimeLLM

[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".

Python 225 11 Updated Jun 13, 2024

wanghao9610 / OV-DINO

Official implementation of OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion

Python 245 13 Updated Sep 15, 2024

LLaVA-VL / LLaVA-NeXT

Python 2,855 235 Updated Oct 16, 2024

THUDM / CogVLM2

GPT4V-level open-source multi-modal model based on Llama3-8B

Python 2,113 145 Updated Sep 3, 2024

OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 5,995 465 Updated Oct 29, 2024

DAMO-NLP-SG / Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Python 2,799 259 Updated Jun 4, 2024

Vision-CAIR / MiniGPT4-video

Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding

Python 553 60 Updated Oct 4, 2024

gradio-app / gradio

Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

Python 33,911 2,572 Updated Nov 13, 2024

zamling / PSALM

[ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"

Python 191 9 Updated Sep 3, 2024

modelscope / ms-swift

Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-V…

Python 4,209 370 Updated Nov 13, 2024

lzw-lzw / GroundingGPT

[ACL 2024] GroundingGPT: Language-Enhanced Multi-modal Grounding Model

Python 302 16 Updated Nov 4, 2024

HRNet / HigherHRNet-Human-Pose-Estimation

This is an official implementation of our CVPR 2020 paper "HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation" (https://arxiv.org/abs/1908.10357)

Python 1,339 272 Updated Apr 12, 2021

AILab-CVC / YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection

Python 4,656 452 Updated Nov 5, 2024

yingsen1 / UniMD

UniMD: Towards Unifying Moment retrieval and temporal action Detection

Python 37 1 Updated Jul 5, 2024

OpenGVLab / VideoMamba

[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding

Python 839 60 Updated Jul 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly