Skip to content
View Ziyang412's full-sized avatar

Highlights

  • Pro

Block or report Ziyang412

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

TVBench: Redesigning Video-Language Evaluation

Python 7 Updated Oct 25, 2024

Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2

Jupyter Notebook 1,102 103 Updated Nov 3, 2024

[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".

Python 225 11 Updated Jun 13, 2024
Python 72 Updated Dec 13, 2023

Accelerating the development of large multimodal models (LMMs) with lmms-eval

Python 2,049 149 Updated Nov 14, 2024

LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture

Python 179 12 Updated Oct 12, 2024

Some preliminary explorations of Mamba's context scaling.

Python 191 10 Updated Feb 8, 2024

Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges

Python 49 Updated Sep 19, 2024

[ECCVW'24] Long-form Video Understanding by Bridging Episodic Memory and Semantic Knowledge

Python 14 2 Updated Sep 27, 2024

Playground Code for paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"

Python 1 Updated Aug 6, 2024

This is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"

Python 129 7 Updated Aug 11, 2024

A method to increase the speed and lower the memory footprint of existing vision transformers.

Python 969 69 Updated Jun 17, 2024

Long Context Transfer from Language to Vision

Python 334 17 Updated Oct 26, 2024

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

406 12 Updated Jun 18, 2024

Official implementation of Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model

Python 390 16 Updated Jun 15, 2024

Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.

Python 645 46 Updated Sep 27, 2024

[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.

Python 66 2 Updated Jul 27, 2024

Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"

Python 1 Updated Mar 20, 2024
Python 38 6 Updated Jul 10, 2024

The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"

Python 1,555 212 Updated Apr 9, 2024

An official implementation for "X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval"

Python 136 15 Updated Apr 6, 2024

Awesome papers & datasets specifically focused on long-term videos.

199 8 Updated Oct 17, 2024

[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

Python 156 23 Updated Sep 24, 2023

Official code repository for: DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning (COLM 2024)

Python 119 16 Updated Sep 12, 2024

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)

Python 130 11 Updated Jul 25, 2024

Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"

Python 16 Updated Nov 11, 2024

The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models. Paper: https://arxiv.org/abs/2402.01620

Python 30 6 Updated Feb 5, 2024

Code for ACL 2024 paper "Soft Self-Consistency Improves Language Model Agents"

Python 16 1 Updated Sep 11, 2024

PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"

Python 28 1 Updated Mar 4, 2024
Next