Stars
🔥🔥MLVU: Multi-task Long Video Understanding Benchmark
Accelerating the development of large multimodal models (LMMs) with lmms-eval
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
A high-throughput and memory-efficient inference and serving engine for LLMs
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
High-resolution models for human tasks.
Official implementation of "LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference"
强化学习中文教程(蘑菇书🍄),在线阅读地址:https://datawhalechina.github.io/easy-rl/
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
[ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"
[ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted fo…
An open-source implementation for training LLaVA-NeXT.
[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频 | 评论爬虫、微博帖子 | 评论爬虫、百度贴吧帖子 | 百度贴吧评论回复爬虫 | 知乎问答文章|评论爬虫
[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and…
OpenMMLab Foundational Library for Training Deep Learning Models
Tool for automating common video key-frame extraction, video compression and Image Auto-crop/Image-resize tasks
It is a simple python tool to extract key-frames from a video file using peak estimation from frame difference.
Video QA Assistant based on LLMs with frame convolution
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding