Block or Report
Block or report CserDu
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLists (9)
Sort Name ascending (A-Z)
Stars
Language
Sort by: Recently starred
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
Official implementation of SEED-LLaMA (ICLR 2024).
A method to increase the speed and lower the memory footprint of existing vision transformers.
[CVPR 2024 Highlight] Official GraCo: Granularity-Controllable Interactive Segmentation.
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.
Retrieval and Retrieval-augmented LLMs
Long Context Transfer from Language to Vision
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
Awesome papers & datasets specifically focused on long-term videos.
Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"
[CVPR 2024] 🎬💭 chat with over 10K frames of video!
Code and documentation to train Stanford's Alpaca models, and generate the data.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
mPLUG-Owl & mPLUG-Owl2: Modularized Multimodal Large Language Model
(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Official code for "A Closer Look at Audio-Visual Segmentation"
Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"
Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
[ECCV 2022] Official implementation of the paper: Audio-Visual Segmentation