Skip to content
View vhzy's full-sized avatar

Block or report vhzy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

Python 71 1 Updated Oct 25, 2024

SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

Python 174 10 Updated Sep 16, 2024

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

Python 3,071 252 Updated Sep 5, 2024

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 3,124 190 Updated Oct 4, 2024

本项目是自动化学报中AUTOPLAN的代码地址,使用大语言模型完成了复杂任务的任务规划以及任务执行

Python 77 6 Updated Nov 14, 2024

个人项目地址,一些大语言模型和多模态模型的应用

Python 119 7 Updated Nov 6, 2024

Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision

Python 47 4 Updated Jul 10, 2024

DAMO-ConvAI: The official repository which contains the codebase for Alibaba DAMO Conversational AI.

Python 1,236 186 Updated Nov 18, 2024

PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models

Python 245 11 Updated Jan 2, 2024

MLNLP社区用来帮助大家避免论文投稿小错误的整理仓库。 Paper Writing Tips

3,621 468 Updated May 29, 2022
Python 65 6 Updated Jul 30, 2024

Multimodal Video Understanding Framework (MVU)

Python 23 Updated May 15, 2024

[NAACL 2024] Official Implementation of paper "Self-Adaptive Sampling for Efficient Video Question Answering on Image--Text Models"

Python 9 3 Updated Jul 25, 2024
Python 120 5 Updated Sep 29, 2024

Awesome papers & datasets specifically focused on long-term videos.

209 9 Updated Nov 15, 2024

Code repository for supporting the paper "Atlas Few-shot Learning with Retrieval Augmented Language Models",(https//arxiv.org/abs/2208.03299)

Python 516 67 Updated Nov 28, 2023

Materials for the Hugging Face Diffusion Models Course

Jupyter Notebook 169 21 Updated Feb 27, 2023

《动手做科研》面向科研初学者,一步一步地展示如何入门人工智能科研

Jupyter Notebook 248 7 Updated Nov 9, 2024

Long Context Transfer from Language to Vision

Python 334 17 Updated Oct 26, 2024

This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)

Python 131 5 Updated Sep 9, 2024

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

406 12 Updated Jun 18, 2024

Code for paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"

Python 80 3 Updated Aug 6, 2024

该仓库是MLNLP社区用来帮助大家避免论文投稿小错误的整理仓库。 Paper Writing Tips

4 1 Updated May 29, 2022

(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Python 244 27 Updated Jul 19, 2024

Public release for "Explore until Confident: Efficient Exploration for Embodied Question Answering"

Python 35 3 Updated Jul 5, 2024

Language Repository for Long Video Understanding

Python 28 3 Updated Jun 17, 2024

A collection of awesome text-to-image generation studies.

TeX 426 24 Updated Nov 5, 2024

Search, organize, discover anything!

Jupyter Notebook 47 5 Updated Apr 18, 2024
Next