Stars
Tools to download and cleanup Common Crawl data
Alibaba Java Diagnostic Tool Arthas/Alibaba Java诊断利器Arthas
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Phrase-Based & Neural Unsupervised Machine Translation
formiel / fairseq
Forked from facebookresearch/fairseqFacebook AI Research Sequence-to-Sequence Toolkit written in Python.
code for paper "Cross-modal Contrastive Learning for Speech Translation" (NAACL 2022)
MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction models along with training and inference code, covering but not …
Code for our INTERSPEECH paper Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection
Whisper realtime streaming for long speech-to-text transcription and translation
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Multilingual Voice Understanding Model
Speech, Language, Audio, Music Processing with Large Language Model
《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》
Foundational Models for State-of-the-Art Speech and Text Translation
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-V…
For releasing code related to compression methods for transformers, accompanying our publications
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Bolt is a deep learning library with high performance and heterogeneous flexibility.