Stars
[ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"
INTERSPEECH 2023-2024 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023-24 conference. Explore the latest advances in speech and language processin…
Official implementation of History Aware Multimodal Transformer for Vision-and-Language Navigation (NeurIPS'21).
[ICLR 2022] code for "How Much Can CLIP Benefit Vision-and-Language Tasks?" https://arxiv.org/abs/2107.06383
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Word Discovery in Visually Grounded, Self-Supervised Speech Models
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Phoneme segmentation using pre-trained speech models
American Sign Language to Speech Application.
speech self-supervised representations
Vector-Quantized Contrastive Predictive Coding for Acoustic Unit Discovery and Voice Conversion
Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation (INTERSPEECH 2020)
Global Rhythm Style Transfer Without Text Transcriptions
Bottom-up features extractor implemented in PyTorch.
An open-source NLP research library, built on PyTorch.
Pretrained ConvNets for pytorch: NASNet, ResNeXt, ResNet, InceptionV4, InceptionResnetV2, Xception, DPN, etc.
Unsupervised word segmentation and clustering of speech
💬 Command-line translator using Google Translate, Bing Translator, Yandex.Translate, etc.
Data and code for grapheme-to-phoneme transducers in lots of languages