[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Python 629 45 Updated Oct 27, 2024

luosiallen / Diff-Foley

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

Python 160 19 Updated May 29, 2024

ZhangXInFD / SpeechTokenizer

This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Python 470 40 Updated Jun 9, 2024

kelseyhightower / nocode

The best way to write secure and reliable applications. Write nothing; deploy nowhere.

Dockerfile 60,846 4,716 Updated Aug 7, 2024

Plachtaa / VALL-E-X

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/

Python 7,659 761 Updated Feb 11, 2024

haoheliu / AudioLDM2

Text-to-Audio/Music Generation

Python 2,297 179 Updated Sep 29, 2024

jbhuang0604 / awesome-tips

3,451 195 Updated Nov 2, 2024

YuanGongND / whisper-at

Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"

Python 320 27 Updated Feb 21, 2024

DmitryRyumin / INTERSPEECH-2023-24-Papers

INTERSPEECH 2023-2024 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023-24 conference. Explore the latest advances in speech and language processin…

638 42 Updated Aug 9, 2024

facebookresearch / audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…

Python 20,919 2,137 Updated Jul 18, 2024

VanyaBK / visual_ASR_EC

Dataset for the task of visual ASR Error Correction

5 Updated Jun 6, 2023

jasonppy / PromptingWhisper

Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation

Python 132 11 Updated Jan 16, 2024

jasonppy / syllable-discovery

Syllable Segmentation and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model

Python 30 5 Updated Aug 27, 2023

YuanGongND / ltu

Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".

Python 382 36 Updated Apr 24, 2024

pengsida / learning_research

本人的科研经验

5,887 346 Updated Nov 3, 2024

roudimit / c2kd

Code for the C2KD paper (ICASSP 2023)

Python 16 1 Updated May 15, 2023

lstrgar / self-supervised-phone-segmentation

Phoneme segmentation using pre-trained speech models

Python 51 10 Updated Nov 4, 2022

ankitapasad / layerwise-analysis

Layer-wise analysis of self-supervised pre-trained speech representations

Python 96 16 Updated Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Puyuan Peng jasonppy

Achievements

Achievements

Highlights

Block or report jasonppy

Stars

AlanBaade / SyllableLM

kyutai-labs / moshi

ChanganVR / action2sound

Ego4DSounds / Ego4DSounds

zszheng147 / Spatial-AST

Liyan06 / MiniCheck

huggingface / parler-tts

amazon-science / tofueval

compling-wat / ura-practice

dynamic-superb / dynamic-superb

chenqi008 / pymcd

coqui-ai / TTS

ddlBoJack / emotion2vec