- Kawasaki, Kanagawa, Japan
- @nizumical
- https://scholar.google.co.jp/citations?user=dTEKquEAAAAJ&hl=en
Stars
Language
Sort by: Recently starred
The Audio Set Ontology aims to provide a comprehensive set of categories to describe sound events.
『ゼロから作る Deep Learning ❺』(O'Reilly Japan, 2024)
[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
JEPAs for audio representation learning
Awesome speech/audio LLMs, representation learning, and codec models
MU-LLaMA: Music Understanding Large Language Model
Reading list for research topics in Sound AI
This repo includes the official implementations of "Fine-tune the pretrained ATST model for sound event detection".
Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
Visualization toolbox for Sound Event Detection
SoundFile is an audio library based on libsndfile, CFFI, and NumPy
Music Audio Representation Benchmark for Universal Evaluation
Analyzing partial dimensional collapse in non-contrastive self-supervised learning. "Understanding Collapse in Non-Contrastive Siamese Representation Learning." In ECCV, 2022.
Mi-Go is an open-source test framework designed to evaluate and compare the accuracy of speech-to-text models on YouTube dataset.
[ACM MM'23] UMMAFormer: A Universal Multimodal-adaptive Transformer Framework For Temporal Forgery Localization
A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.
Code for evaluating Japanese pretrained models provided by NTT Ltd.
Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"
✨✨Latest Advances on Multimodal Large Language Models