Lists (1)
Sort Name ascending (A-Z)
Stars
Train transformer language models with reinforcement learning.
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
On-device AI across mobile, embedded and edge for PyTorch
Implementation of the proposed minGRU in Pytorch
The First Multimodal Seach Engine Pipeline and Benchmark for LMMs
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
Malfunctioning Industrial Machine Investigation and Inspection
An Open-Sourced LLM-empowered Foundation TTS System
This repository contains audio samples and supplementary materials accompanying publications by the "Speaker, Voice and Language" team at Google.
Text-to-Music Generation with Rectified Flow Transformers
nanobind: tiny and efficient C++/Python bindings
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Papers for LLM and foundation models for time series analytics
LlamaVoice is a llama-based large voice generation model, providing inference and training ability.
Inference and training library for high-quality TTS models.
This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation
Turns Data and AI algorithms into production-ready web applications in no time.
MiniSora: A community aims to explore the implementation path and future development direction of Sora.
Transform datasets at scale. Optimize datasets for fast AI model training.
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
Refactored / updated version of `stable-audio-tools` which is an open-source code for audio/music generative models originally by Stability AI.
Graphic notes on Gilbert Strang's "Linear Algebra for Everyone"
A generative speech model for daily dialogue.
Perplexica is an AI-powered search engine. It is an Open source alternative to Perplexity AI