Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
Let your Claude able to think
①[ICLR2024 Spotlight] (GPT-4V/Gemini-Pro/Qwen-VL-Plus+16 OS MLLMs) A benchmark for multi-modality LLMs (MLLMs) on low-level vision and visual quality assessment.
A comprehensive collection of IQA papers
B-Llama3o a llama3 with Vision Audio and Audio understanding as well as text and Audio and Animation Data output.
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Text-to-Music Generation with Rectified Flow Transformers
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
A Python project which can detect gender and age using OpenCV of the person (face) in a picture or through webcam.
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
坚持分享 GitHub 上高质量、有趣实用的开源技术教程、开发者工具、编程网站、技术资讯。A list cool, interesting projects of GitHub.
Generative models for conditional audio generation
This repo contains the official PyTorch implementation of: Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation
Official implementation of the pipeline presented in I hear your true colors: Image Guided Audio Generation
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
Code for fintune ChatGLM-6b using low-rank adaptation (LoRA)