Skip to content
View gyxxyg's full-sized avatar

Block or report gyxxyg

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.

541 34 Updated Nov 3, 2024

GLM-4-Voice | 端到端中英语音对话模型

Python 2,044 160 Updated Oct 31, 2024

Offical code for Interpret Your Decision: Logical Reasoning Regularization for Generalization in Visual Classification (NeurIPS 2024 Spotlight)

Python 6 Updated Oct 21, 2024

Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"

Python 364 9 Updated Sep 2, 2024

Baichuan-Omni: Towards Capable Open-source Omni-modal LLM 🌊

222 7 Updated Nov 2, 2024

Codebase for Aria - an Open Multimodal Native MoE

Jupyter Notebook 768 66 Updated Nov 4, 2024

[Preprint] TRACE: Temporal Grounding Video LLM via Casual Event Modeling

Python 34 Updated Nov 2, 2024

Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

Python 213 14 Updated Aug 11, 2024

Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models

Python 51 4 Updated Nov 3, 2024

Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"

Python 67 5 Updated Oct 28, 2024

Next-Token Prediction is All You Need

Python 1,764 65 Updated Oct 24, 2024

👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)

Python 23 Updated Nov 4, 2024

Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models

131 6 Updated Nov 4, 2024

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Python 997 48 Updated Jan 16, 2024

A family of open-sourced Mixture-of-Experts (MoE) Large Language Models

Python 1,382 71 Updated Mar 8, 2024

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

109 2 Updated Nov 3, 2024

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.

4,967 278 Updated Nov 1, 2024

Inference Code for Paper "Harder Tasks Need More Experts: Dynamic Routing in MoE Models"

Python 22 2 Updated Jul 30, 2024

Writing AI Conference Papers: A Handbook for Beginners

1,278 44 Updated Oct 29, 2024
Python 12 1 Updated Sep 13, 2024

LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture

Python 175 11 Updated Oct 12, 2024

An VideoQA dataset based on the videos from ActivityNet

Python 67 9 Updated Nov 22, 2020

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 2,906 172 Updated Oct 4, 2024

Video Question Answering via Gradually Refined Attention over Appearance and Motion

Python 152 27 Updated Dec 5, 2017

✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM

Python 937 55 Updated Oct 24, 2024

Triton-based implementation of Sparse Mixture of Experts.

Python 183 14 Updated Oct 10, 2024

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)

Python 1,958 156 Updated Oct 31, 2024

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family

Python 2,308 176 Updated Oct 15, 2024

Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch

Python 474 27 Updated Oct 25, 2024

Long Context Transfer from Language to Vision

Python 326 17 Updated Oct 26, 2024
Next