Skip to content
View bigpon's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report bigpon

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Generation scripts for EARS-WHAM and EARS-Reverb

Python 21 3 Updated Sep 16, 2024

FMA: A Dataset For Music Analysis

Jupyter Notebook 2,241 439 Updated Jan 5, 2023

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

402 12 Updated Jun 18, 2024

PAM is a no-reference audio quality metric for audio generation tasks

Python 48 5 Updated Jul 19, 2024

NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment

Python 680 117 Updated Mar 8, 2024

Speech Human Evaluation Estimation Toolkit (SHEET)

Python 32 2 Updated Nov 7, 2024
Python 6,694 508 Updated Oct 31, 2024

An efficient video loader for deep learning with smart shuffling that's super easy to digest

C++ 1,881 161 Updated Jul 17, 2024

A simple library for Fréchet Audio Distance (FAD) calculation

Python 145 21 Updated Oct 13, 2024

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Jupyter Notebook 6,292 777 Updated Nov 8, 2024

VideoSys: An easy and efficient system for video generation

Python 1,760 120 Updated Nov 10, 2024

Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"

Python 366 9 Updated Sep 2, 2024

An invertible and differentiable implementation of the Constant-Q Transform (CQT).

Python 54 3 Updated Dec 9, 2022

Efficient synchronization from sparse cues

Python 28 4 Updated Apr 25, 2024

TorchCFM: a Conditional Flow Matching library

Python 1,203 98 Updated Oct 9, 2024

Expressive Anechoic Recordings of Speech (EARS)

Python 130 7 Updated Jun 25, 2024

Lumina-T2X is a unified framework for Text to Any Modality Generation

Python 2,069 87 Updated Aug 6, 2024
Python 100 8 Updated Oct 7, 2024

Audio Normalization for Python/ffmpeg

Python 1,274 117 Updated Oct 22, 2024

a MUSHRA compliant web audio API based experiment software

JavaScript 351 137 Updated Aug 9, 2024

Audio Dataset for training CLAP and other models

Python 632 53 Updated Feb 5, 2024

Audiogen Codec

Python 127 11 Updated Jul 9, 2024

Confidence interval computation for evaluation in machine learning using the bootstrapping approach

Jupyter Notebook 66 8 Updated Apr 5, 2024

Generative models for conditional audio generation

Python 2,704 256 Updated Nov 5, 2024

Foundational model for human-like, expressive TTS

Python 3,878 658 Updated Jul 30, 2024

Unofficial implementation of miipher

Python 111 15 Updated Apr 19, 2024

Official Code for DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing (CVPR 2024)

Python 416 43 Updated Apr 24, 2024

Official Code for DragGAN (SIGGRAPH 2023)

Python 35,706 3,454 Updated May 18, 2024

StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation

Python 179 24 Updated Sep 13, 2024
Python 34 3 Updated Oct 9, 2024
Next