Skip to content
View XiaoYuanJun-zy's full-sized avatar
Block or Report

Block or report XiaoYuanJun-zy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

The Official Implementation of “Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis”

Python 79 12 Updated Dec 20, 2022

Large-Scale Selfie Video Dataset (L-SVD): A Benchmark for Emotion Recognition

407 42 Updated Jun 10, 2024

Code for ACL2023 paper 《DualGATs: Dual Graph Attention Networks for Emotion Recognition in Conversations》

Python 58 12 Updated Aug 5, 2023

Graph Attention Networks (https://arxiv.org/abs/1710.10903)

Python 3,161 642 Updated Apr 9, 2022

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Python 4,577 363 Updated Aug 10, 2024

A pipeline to read lips and generate speech for the read content, i.e Lip to Speech Synthesis.

Python 73 19 Updated Nov 25, 2021

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 3,912 372 Updated Aug 16, 2024
Python 18 1 Updated Aug 2, 2023

All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment

Python 11 Updated Dec 27, 2023

Reading list for research topics in multimodal machine learning

5,778 836 Updated Jun 19, 2024

Event-based Motion Deblurring with Modality-Aware Decomposition and Recomposition

Python 4 Updated Jul 8, 2024
Python 2 Updated Jan 5, 2024

Expression Snippet Transformer for Robust Video-based Facial Expression Recognition

Python 13 2 Updated Jan 27, 2024

[CVPR 2023] This is the official implementation of "Weakly Supervised Video Emotion Detection and Prediction via Cross-Modal Temporal Erasing Network"

Python 28 Updated Jul 10, 2024

MAE-DFER: Efficient Masked Autoencoder for Self-supervised Dynamic Facial Expression Recognition (ACM MM 2023)

Python 89 12 Updated Jan 16, 2024

Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)

Python 177 17 Updated Aug 16, 2024

deeplearning.ai(吴恩达老师的深度学习课程笔记及资源)

HTML 17,795 5,860 Updated Apr 29, 2022

[ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"

Python 30 2 Updated Aug 13, 2024

Math OCR model that outputs LaTeX and markdown

Python 681 55 Updated Jun 30, 2024

Geometric Computer Vision Library for Spatial AI

Python 9,713 949 Updated Aug 15, 2024

Visual Speech Recognition For Low-Resource Languages with Automatic Labels

Python 5 Updated May 21, 2024

WildVSR

Python 11 Updated Dec 13, 2023

A self-supervised learning framework for audio-visual speech

Python 822 130 Updated Dec 7, 2023

💎1MB lightweight face detection model (1MB轻量级人脸检测模型)

Python 7,118 1,538 Updated Dec 29, 2023

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

Python 1,015 89 Updated Aug 12, 2024

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Python 5,509 597 Updated Aug 16, 2024

Official Implementation of Visual Transformer Pooling for Lip reading

Roff 35 5 Updated Aug 8, 2022

Official implementation of RAVEn (ICLR 2023) and BRAVEn (ICASSP 2024)

Python 50 3 Updated Jul 18, 2024

Auto-AVSR: Lip-Reading Sentences Project

Python 154 40 Updated Apr 16, 2024
Python 286 25 Updated May 19, 2024
Next