Stars
Modular and customizable Material Design UI components for the web
multi-task and multi-track music transcription for everyone
Audio Plugin for Audio to MIDI transcription using deep learning.
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting yo…
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
To be the world's best PyTorch project template.
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
AI一键批量生成各类短视频,自动批量混剪短视频,自动把视频发布到抖音,快手,小红书,视频号上,赚钱从来没有这么容易过! 支持本地语音模型chatTTS,fasterwhisper,GPTSoVITS,支持云语音:Azure,阿里云,腾讯云。支持Stable diffusion,comfyUI直接AI生图。Generate short videos with one click using A…
利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.
VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
Understand Human Behavior to Align True Needs
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
av-SALMONN: Speech-Enhanced Audio-Visual Large Language Models
MambaOut: Do We Really Need Mamba for Vision?
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
Official release of InternLM2.5 base and chat models. 1M context support
SALMONN: Speech Audio Language Music Open Neural Network
🚀 一键部署!真正的 AI 聊天机器人!支持ChatGPT、文心一言、讯飞星火、Bing、Bard、ChatGLM、POE,多账号,人设调教,虚拟女仆、图片渲染、语音发送 | 支持 QQ、Telegram、Discord、微信 等平台
🔊 Text-Prompted Generative Audio Model
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
Open-Sora: Democratizing Efficient Video Production for All