-
Meta
- New York City, NY, US
-
15:31
(UTC -05:00) - https://bigpon.github.io/
Stars
Generation scripts for EARS-WHAM and EARS-Reverb
✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
PAM is a no-reference audio quality metric for audio generation tasks
NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment
An efficient video loader for deep learning with smart shuffling that's super easy to digest
A simple library for Fréchet Audio Distance (FAD) calculation
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
VideoSys: An easy and efficient system for video generation
Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"
An invertible and differentiable implementation of the Constant-Q Transform (CQT).
TorchCFM: a Conditional Flow Matching library
Expressive Anechoic Recordings of Speech (EARS)
Lumina-T2X is a unified framework for Text to Any Modality Generation
Audio Normalization for Python/ffmpeg
a MUSHRA compliant web audio API based experiment software
Audio Dataset for training CLAP and other models
Confidence interval computation for evaluation in machine learning using the bootstrapping approach
Generative models for conditional audio generation
Foundational model for human-like, expressive TTS
Official Code for DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing (CVPR 2024)
Official Code for DragGAN (SIGGRAPH 2023)
StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation