Lists (1)
Sort Name ascending (A-Z)
Stars
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting yo…
🦜🔗 Build context-aware reasoning applications
"Muzu" is a TypeScript npm library for server-side HTTP request handling and routing.
Open, Multi-modal Catalog for Data & AI
A native PyTorch Library for large model training
Reverse Engineering: Decompiling Binary Code with Large Language Models
Inference and training library for high-quality TTS models.
Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
You were probably looking for our website... this is it. We moved our website here, so you can see the insides of how we work.
Instant voice cloning by MIT and MyShell.
[CVPR 2024] MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
A multi-voice TTS system trained with an emphasis on quality
Get up and running with Llama 3.1, Mistral, Gemma 2, and other large language models.
Foundational Models for State-of-the-Art Speech and Text Translation
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
Official Code for DragGAN (SIGGRAPH 2023)
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.
Robust Speech Recognition via Large-Scale Weak Supervision