-
UNC Chapel Hill
- https://ziyangw2000.github.io/
Highlights
- Pro
Stars
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".
Accelerating the development of large multimodal models (LMMs) with lmms-eval
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
Some preliminary explorations of Mamba's context scaling.
Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges
[ECCVW'24] Long-form Video Understanding by Bridging Episodic Memory and Semantic Knowledge
WeiKangda / VideoTree
Forked from Ziyang412/VideoTreePlayground Code for paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"
This is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"
A method to increase the speed and lower the memory footprint of existing vision transformers.
Long Context Transfer from Language to Vision
✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Official implementation of Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model
Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.
Ziyang412 / LLoVi
Forked from CeeZh/LLoViOfficial implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"
The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"
An official implementation for "X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval"
Awesome papers & datasets specifically focused on long-term videos.
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Official code repository for: DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning (COLM 2024)
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)
Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"
The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models. Paper: https://arxiv.org/abs/2402.01620
Code for ACL 2024 paper "Soft Self-Consistency Improves Language Model Agents"
PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"