Skip to content
View daisukelab's full-sized avatar

Block or report daisukelab

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

The Audio Set Ontology aims to provide a comprehensive set of categories to describe sound events.

641 151 Updated May 21, 2018

ディジタル信号処理(慶應義塾大学)

HTML 7 Updated Jun 30, 2024
Python 1 Updated May 28, 2024

Awesome Papers related to Mamba.

1,049 57 Updated Aug 11, 2024

『ゼロから作る Deep Learning ❺』(O'Reilly Japan, 2024)

Jupyter Notebook 226 35 Updated May 21, 2024

[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Python 2,745 175 Updated Aug 2, 2024

JEPAs for audio representation learning

Python 12 Updated Apr 15, 2024

Awesome speech/audio LLMs, representation learning, and codec models

554 26 Updated May 29, 2024

MU-LLaMA: Music Understanding Large Language Model

Python 220 16 Updated Mar 25, 2024

Reading list for research topics in Sound AI

159 8 Updated Aug 8, 2024

This repo includes the official implementations of "Fine-tune the pretrained ATST model for sound event detection".

Jupyter Notebook 79 11 Updated Aug 17, 2024

Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.

511 28 Updated Aug 3, 2024

Visualization toolbox for Sound Event Detection

Python 111 29 Updated Feb 26, 2024

SoundFile is an audio library based on libsndfile, CFFI, and NumPy

Python 693 107 Updated Jul 27, 2024

Audio Captioning datasets for PyTorch.

Python 94 6 Updated Aug 2, 2024

Multi-lingual AudioCaps

7 Updated Nov 20, 2023

Music Audio Representation Benchmark for Universal Evaluation

Python 82 3 Updated May 16, 2024
Python 12 Updated Feb 26, 2024

Analyzing partial dimensional collapse in non-contrastive self-supervised learning. "Understanding Collapse in Non-Contrastive Siamese Representation Learning." In ECCV, 2022.

Jupyter Notebook 12 1 Updated Nov 12, 2023

Mi-Go is an open-source test framework designed to evaluate and compare the accuracy of speech-to-text models on YouTube dataset.

Python 10 2 Updated Jul 2, 2024
Python 2 Updated Aug 22, 2023

[ACM MM'23] UMMAFormer: A Universal Multimodal-adaptive Transformer Framework For Temporal Forgery Localization

Python 46 1 Updated May 16, 2024

A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.

383 25 Updated Jul 17, 2024

Code for evaluating Japanese pretrained models provided by NTT Ltd.

Python 240 23 Updated Jun 21, 2023

Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"

Python 304 25 Updated Feb 21, 2024

LLMとLoRAを用いたテキスト分類

Python 84 5 Updated Jul 22, 2023

✨✨Latest Advances on Multimodal Large Language Models

11,332 740 Updated Aug 26, 2024
Next