Starred repositories
Please see the readme file as well as our 2019 EMNLP paper linked here -->
A dataset consisting of 502 English dialogs with 12,000 annotated utterances between a user and an assistant discussing movie preferences in natural language. It was collected using a Wizard-of-Oz …
Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
Agent framework and applications built upon Qwen2, featuring Function Calling, Code Interpreter, RAG, and Chrome extension.
A Gradio web UI for Large Language Models.
SGLang is a fast serving framework for large language models and vision language models.
This repo includes ChatGPT prompt curation to use ChatGPT better.
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Speech, Language, Audio, Music Processing with Large Language Model
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
This library provides common speech features for ASR including MFCCs and filterbank energies.
🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"
A guidance language for controlling large language models.
kaldi-asr/kaldi is the official location of the Kaldi project.
Tools for handling speech data in machine learning projects.
Speech-to-text server framework with next-gen Kaldi
Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, L…