Skip to content
View CserDu's full-sized avatar
Block or Report

Block or report CserDu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.
Showing results

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,278 92 Updated Jul 5, 2024
Go 2 Updated Jun 8, 2024

Official implementation of SEED-LLaMA (ICLR 2024).

Python 530 30 Updated Apr 11, 2024

A method to increase the speed and lower the memory footprint of existing vision transformers.

Python 908 68 Updated Jun 17, 2024

[CVPR 2024 Highlight] Official GraCo: Granularity-Controllable Interactive Segmentation.

Python 39 Updated Jul 19, 2024

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Python 6,381 355 Updated Jul 11, 2024

Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 6,431 366 Updated Jul 18, 2024

Retrieval and Retrieval-augmented LLMs

Python 6,153 442 Updated Jul 14, 2024

Long Context Transfer from Language to Vision

Python 242 12 Updated Jul 12, 2024

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.

997 55 Updated Jul 23, 2024

Awesome papers & datasets specifically focused on long-term videos.

122 4 Updated Jul 15, 2024

Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"

Python 663 49 Updated Jul 9, 2024

[CVPR 2024] 🎬💭 chat with over 10K frames of video!

Python 464 38 Updated Jun 16, 2024

Code and documentation to train Stanford's Alpaca models, and generate the data.

Python 29,184 4,020 Updated Jul 17, 2024

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 18,298 1,997 Updated Jul 14, 2024

Project Page for "LISA: Reasoning Segmentation via Large Language Model"

Python 1,664 113 Updated Jul 2, 2024

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

Python 629 48 Updated Mar 25, 2024

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Python 549 33 Updated Jul 19, 2024

mPLUG-Owl & mPLUG-Owl2: Modularized Multimodal Large Language Model

Python 2,031 158 Updated Apr 5, 2024

(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Python 191 24 Updated Jul 19, 2024
Python 1,325 72 Updated Jul 22, 2024

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

315 11 Updated Jun 18, 2024

Clash官网各版本Clash下载地址及备份下载地址

218 17 Updated Jul 5, 2024

Official code for "A Closer Look at Audio-Visual Segmentation"

Python 105 18 Updated Jul 11, 2024

Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"

Python 2,369 366 Updated Feb 17, 2024

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".

Jupyter Notebook 1,075 205 Updated May 21, 2023

[ECCV 2022] Official implementation of the paper: Audio-Visual Segmentation

Python 438 41 Updated Nov 28, 2023
Python 542 25 Updated Feb 15, 2024
Jupyter Notebook 213 26 Updated Dec 18, 2023
Next