Skip to content
View jasonppy's full-sized avatar
🍗
🍗

Highlights

  • Pro

Block or report jasonppy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official Code for SyllableLM: Learning Coarse Semantic Units for Speech Language Models

Python 35 1 Updated Oct 10, 2024
Python 6,676 506 Updated Oct 31, 2024

Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos

Python 15 2 Updated Oct 1, 2024

Ego4DSounds: A diverse egocentric dataset with high action-audio correspondence

Python 9 Updated Jun 14, 2024

🦇 Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)

Python 32 2 Updated Oct 12, 2024

MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents [EMNLP 2024]

Python 99 9 Updated Oct 13, 2024

Inference and training library for high-quality TTS models.

Python 4,588 465 Updated Oct 30, 2024

Practice tasks for the CompLING lab internship application.

TeX 7 Updated Aug 19, 2024

The official repository of Dynamic-SUPERB.

Python 159 89 Updated Nov 7, 2024

Package pymcd

Python 28 2 Updated Sep 8, 2022

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 35,262 4,301 Updated Aug 16, 2024

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Python 629 45 Updated Oct 27, 2024

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

Python 160 19 Updated May 29, 2024

This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Python 470 40 Updated Jun 9, 2024

The best way to write secure and reliable applications. Write nothing; deploy nowhere.

Dockerfile 60,846 4,716 Updated Aug 7, 2024

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/

Python 7,659 761 Updated Feb 11, 2024

Text-to-Audio/Music Generation

Python 2,297 179 Updated Sep 29, 2024

Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"

Python 320 27 Updated Feb 21, 2024

INTERSPEECH 2023-2024 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023-24 conference. Explore the latest advances in speech and language processin…

638 42 Updated Aug 9, 2024

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…

Python 20,919 2,137 Updated Jul 18, 2024

Dataset for the task of visual ASR Error Correction

5 Updated Jun 6, 2023

Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation

Python 132 11 Updated Jan 16, 2024

Syllable Segmentation and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model

Python 30 5 Updated Aug 27, 2023

Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".

Python 382 36 Updated Apr 24, 2024

本人的科研经验

5,887 346 Updated Nov 3, 2024

Code for the C2KD paper (ICASSP 2023)

Python 16 1 Updated May 15, 2023

Phoneme segmentation using pre-trained speech models

Python 51 10 Updated Nov 4, 2022

Layer-wise analysis of self-supervised pre-trained speech representations

Python 96 16 Updated Oct 18, 2024
Next