Skip to content
View yinyxl's full-sized avatar

Highlights

  • Pro

Block or report yinyxl

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型

Python 4,080 416 Updated Aug 23, 2024

🔊 Repository for our NAACL-HLT 2019 paper: AudioCaps

Python 139 17 Updated Apr 23, 2024

An open-source framework for training large multimodal models.

Python 3,697 281 Updated Aug 31, 2024

UT-Sarulab MOS prediction system using SSL models

Python 177 13 Updated Apr 11, 2024

Python parser and tools for MUSDB18 Music Separation Dataset

Python 161 34 Updated Nov 24, 2023

Writing AI Conference Papers: A Handbook for Beginners

1,152 38 Updated Sep 26, 2024

A family of diffusion models for text-to-audio generation.

Python 1,010 79 Updated Jul 3, 2024

Typing to Listen at the Cocktail Party: Text-Guided Target Speaker Extraction (LLM-TSE)

JavaScript 32 2 Updated Oct 13, 2023

Implementation of "Audio Retrieval with Natural Language Queries: A Benchmark Study".

Python 46 2 Updated Jul 22, 2022

Versatile audio super resolution (any -> 48kHz) with AudioSR.

Python 1,125 108 Updated May 10, 2024

PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.

Python 180 10 Updated Oct 2, 2024

LLM101n: Let's build a Storyteller

29,368 1,608 Updated Aug 1, 2024
Python 134 22 Updated Oct 13, 2024

Code for GeSS: Benchmarking Geometric Deep Learning under Scientific Applications with Distribution Shifts

Python 13 1 Updated Jun 2, 2024

Audio Codec Speech processing Universal PERformance Benchmark

Python 208 22 Updated Sep 28, 2024

AcademiCodec: An Open Source Audio Codec Model for Academic Research

Python 578 80 Updated Dec 27, 2023

Metadata, scripts and baselines for the MTG-Jamendo dataset

Python 267 38 Updated Jul 9, 2024

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Python 6,144 542 Updated May 31, 2024

Official implementation of "Separate Anything You Describe"

Python 1,598 116 Updated Mar 31, 2024

Official Implementation of EnCLAP (ICASSP 2024)

Python 88 5 Updated Jun 2, 2024

VGGSound: A Large-scale Audio-Visual Dataset

Python 287 32 Updated Sep 13, 2021

The official code repo for "Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data", in AAAI 2022

Python 186 31 Updated Jul 14, 2022

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

Python 25,563 5,289 Updated Oct 16, 2024

LaTeX template for USTC thesis

TeX 1,616 399 Updated Oct 15, 2024

Source code for the paper 'Audio Captioning Transformer'

Jupyter Notebook 48 3 Updated Jan 18, 2022

AudioLDM training, finetuning, evaluation and inference.

Python 200 39 Updated Jun 2, 2024

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python 19,866 2,532 Updated Oct 10, 2024

GAN-based Mel-Spectrogram Inversion Network for Text-to-Speech Synthesis

Python 966 214 Updated Aug 28, 2023

Official repo for WavCraft, an AI agent for audio creation and editing

Python 650 96 Updated Sep 13, 2024

Perceptual Quality Estimator for speech and audio

C++ 687 124 Updated Aug 2, 2024
Next