Skip to content
View SCZwangxiao's full-sized avatar

Block or report SCZwangxiao

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

🔥🔥MLVU: Multi-task Long Video Understanding Benchmark

Python 150 Updated Oct 12, 2024

Accelerating the development of large multimodal models (LMMs) with lmms-eval

Python 1,565 128 Updated Oct 20, 2024

LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture

Python 165 10 Updated Oct 12, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 28,792 4,268 Updated Oct 20, 2024

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

Python 6,553 671 Updated Aug 12, 2024

High-resolution models for human tasks.

Python 4,284 230 Updated Oct 15, 2024

Official implementation of "LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference"

Python 70 3 Updated Sep 19, 2024

强化学习中文教程(蘑菇书🍄),在线阅读地址:https://datawhalechina.github.io/easy-rl/

Jupyter Notebook 9,301 1,850 Updated Sep 9, 2024
Python 118 3 Updated Oct 17, 2024

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Python 11,405 1,017 Updated Oct 19, 2024

[ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"

Python 118 4 Updated Sep 10, 2024

[ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models

Python 288 15 Updated May 27, 2024

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted fo…

Python 1,185 106 Updated Aug 27, 2024
Python 123 8 Updated Sep 25, 2024
Python 2,676 209 Updated Oct 16, 2024

An open-source implementation for training LLaVA-NeXT.

Python 274 12 Updated Oct 15, 2024

[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions

Python 139 4 Updated Jul 1, 2024

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Python 2,483 154 Updated Oct 10, 2024

Official repository for the paper PLLaVA

Python 575 39 Updated Jul 28, 2024

小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频 | 评论爬虫、微博帖子 | 评论爬虫、百度贴吧帖子 | 百度贴吧评论回复爬虫 | 知乎问答文章|评论爬虫

Python 17,063 5,401 Updated Oct 19, 2024

[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Python 509 19 Updated Jun 26, 2024

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Python 6,170 545 Updated May 31, 2024

Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and…

TypeScript 31,642 5,516 Updated Oct 15, 2024

OpenMMLab Foundational Library for Training Deep Learning Models

Python 1,161 351 Updated Sep 20, 2024

Tool for automating common video key-frame extraction, video compression and Image Auto-crop/Image-resize tasks

Python 307 58 Updated Aug 6, 2024

It is a simple python tool to extract key-frames from a video file using peak estimation from frame difference.

Python 139 25 Updated Aug 7, 2024

Video QA Assistant based on LLMs with frame convolution

Python 199 6 Updated Dec 15, 2023

[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Python 802 44 Updated Oct 16, 2024
Next