Skip to content
View vhzy's full-sized avatar

Block or report vhzy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

Python 2,973 244 Updated Sep 5, 2024

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 1,740 96 Updated Sep 10, 2024

本项目是自动化学报中AUTOPLAN的代码地址,使用大语言模型完成了复杂任务的任务规划以及任务执行

Python 56 6 Updated Aug 27, 2024

Minicpm和MiniCPM-V的项目和教程。包括推理,量化,边端部署,微调,技术报告、应用六个主题

Python 83 3 Updated Sep 11, 2024

Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision

Python 44 4 Updated Jul 10, 2024

DAMO-ConvAI: The official repository which contains the codebase for Alibaba DAMO Conversational AI.

Python 1,156 184 Updated Aug 19, 2024

PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models

Python 233 11 Updated Jan 2, 2024

MLNLP社区用来帮助大家避免论文投稿小错误的整理仓库。 Paper Writing Tips

3,474 447 Updated May 29, 2022
Python 47 3 Updated Jul 30, 2024

Please refer to our official repo at https://github.com/IVGSZ/Flash-VStream.

Python 47 8 Updated Aug 15, 2024

Multimodal Video Understanding Framework (MVU)

Python 22 Updated May 15, 2024

[NAACL 2024] Official Implementation of paper "Self-Adaptive Sampling for Efficient Video Question Answering on Image--Text Models"

Python 9 2 Updated Jul 25, 2024
Python 99 5 Updated Apr 15, 2024

Awesome papers & datasets specifically focused on long-term videos.

153 4 Updated Jul 15, 2024

Code repository for supporting the paper "Atlas Few-shot Learning with Retrieval Augmented Language Models",(https//arxiv.org/abs/2208.03299)

Python 508 68 Updated Nov 28, 2023

Materials for the Hugging Face Diffusion Models Course

Jupyter Notebook 159 20 Updated Feb 27, 2023

《动手做科研》面向科研初学者,一步一步地展示如何入门人工智能科研

Jupyter Notebook 168 5 Updated Sep 1, 2024

Long Context Transfer from Language to Vision

Python 291 16 Updated Aug 26, 2024

This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)

Python 93 5 Updated Sep 9, 2024

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

364 11 Updated Jun 18, 2024

Code for paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"

Python 66 2 Updated Aug 6, 2024

该仓库是MLNLP社区用来帮助大家避免论文投稿小错误的整理仓库。 Paper Writing Tips

3 1 Updated May 29, 2022

(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Python 209 27 Updated Jul 19, 2024

Public release for "Explore until Confident: Efficient Exploration for Embodied Question Answering"

Python 23 3 Updated Jul 5, 2024

Language Repository for Long Video Understanding

Python 27 3 Updated Jun 17, 2024

A collection of awesome text-to-image generation studies.

TeX 319 16 Updated Sep 5, 2024

Search, organize, discover anything!

Jupyter Notebook 44 5 Updated Apr 18, 2024

[CSUR] A Survey on Video Diffusion Models

1,684 85 Updated Aug 26, 2024

This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi.

Python 1,741 158 Updated May 25, 2024
Next