Stars
VoiceRestore: Flow-Matching Transformers for Universal Speech Restoration
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
An Open-Sourced LLM-empowered Foundation TTS System
State-of-the-Art zero-shot voice conversion & singing voice conversion with in context learning
This is the official reproduction of FancyVideo.
基于大模型搭建的聊天机器人,同时支持 微信公众号、企业微信应用、飞书、钉钉 等接入,可选择GPT3.5/GPT-4o/GPT-o1/ Claude/文心一言/讯飞星火/通义千问/ Gemini/GLM-4/Claude/Kimi/LinkAI,能处理文本、语音和图片,访问操作系统和互联网,支持基于自有知识库进行定制企业智能客服。
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Official code of "EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model"
AI powered speech denoising and enhancement
Fuse ChatTTS with OpenVoice, upload a 10-second audio clip, and clone your personalized ChatTTS voice.
[CAAI AIR'24] Bilateral Reference for High-Resolution Dichotomous Image Segmentation
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑🔬
Semantic Search on Wikipedia with Upstash Vector
A collection of awesome video generation studies.
Official implementation of "Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices" (ICML 2024).
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
Run any ComfyUI workflow w/ ZERO setup.
Gradio Demo for ComfyDeploy
ComfyUI-Manager is an extension designed to enhance the usability of ComfyUI. It offers management functions to install, remove, disable, and enable various custom nodes of ComfyUI. Furthermore, th…
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)