Skip to content
View JiajunHe1025's full-sized avatar

Block or report JiajunHe1025

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

SLT 2024 Challenge: Post-ASR-Speaker-Tagging

Python 13 1 Updated Jun 16, 2024

Cross-Speaker Encoding Network for Multi-talker Speech Recognition

Python 10 1 Updated Aug 24, 2024

Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition

Python 11 1 Updated Jul 16, 2024

Repository for "LLM-based speaker diarization correction: A generalizable approach" paper

Jupyter Notebook 10 Updated Jul 31, 2024
Python 1 1 Updated Jun 11, 2024

Single-blind supplementary materials for NeurIPS 2023 submission

Python 58 4 Updated Aug 21, 2024

MooER: Open-sourced LLM for audio understanding trained on 80,000 hours of data

Python 121 7 Updated Sep 4, 2024

Foal-Net:Enhancing Modal Fusion by Alignment and Label Matching for Multimodal Emotion Recognition

Python 2 Updated Oct 1, 2024

[ICLR 2023] Codebase for Copy-Generator model, including an implementation of kNN-LM

Python 182 22 Updated Jul 20, 2023
Python 2 Updated May 11, 2023
Python 1 Updated Mar 28, 2024
7 Updated Jul 4, 2024
8 Updated Aug 15, 2024

Textualized and Feature-based Models for Compound Multimodal Emotion Recognition in the Wild, ABAW 7th - Challenge - Compound Expression (CE) Recognition Challenge

Python 3 Updated Sep 20, 2024
Python 4 1 Updated Aug 11, 2024
Python 4 Updated Jan 18, 2024

The offical realization of InstructERC

Python 121 7 Updated Jul 16, 2024
Python 7 Updated Sep 22, 2024

[Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation

Jupyter Notebook 70 3 Updated Aug 22, 2024

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Python 6,205 659 Updated Sep 30, 2024

Generative Fusion Decoding (GFD) is a novel framework for integrating Large Language Models (LLMs) into multi-modal text recognition systems like ASR and OCR, improving performance and efficiency b…

Python 63 8 Updated Sep 2, 2024

Pytorch implementation for Tailor Versatile Multi-modal Learning for Multi-label Emotion Recognition

Python 53 13 Updated Nov 16, 2022

This repo contains a list of the 44,998 most common Japanese words in order of frequency, as determined by the University of Leeds Corpus.

66 11 Updated Sep 13, 2018
Python 1 Updated Jul 22, 2024
Python 43 1 Updated Jun 27, 2024

The implementation of CubeMLP

Python 40 5 Updated May 8, 2023
Next