Block or Report
Block or report XiaoYuanJun-zy
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
The Official Implementation of “Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis”
Large-Scale Selfie Video Dataset (L-SVD): A Benchmark for Emotion Recognition
Code for ACL2023 paper 《DualGATs: Dual Graph Attention Networks for Emotion Recognition in Conversations》
Graph Attention Networks (https://arxiv.org/abs/1710.10903)
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
A pipeline to read lips and generate speech for the read content, i.e Lip to Speech Synthesis.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment
Reading list for research topics in multimodal machine learning
Event-based Motion Deblurring with Modality-Aware Decomposition and Recomposition
Expression Snippet Transformer for Robust Video-based Facial Expression Recognition
[CVPR 2023] This is the official implementation of "Weakly Supervised Video Emotion Detection and Prediction via Cross-Modal Temporal Erasing Network"
MAE-DFER: Efficient Masked Autoencoder for Self-supervised Dynamic Facial Expression Recognition (ACM MM 2023)
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
deeplearning.ai(吴恩达老师的深度学习课程笔记及资源)
[ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"
Math OCR model that outputs LaTeX and markdown
Geometric Computer Vision Library for Spatial AI
Visual Speech Recognition For Low-Resource Languages with Automatic Labels
A self-supervised learning framework for audio-visual speech
💎1MB lightweight face detection model (1MB轻量级人脸检测模型)
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Official Implementation of Visual Transformer Pooling for Lip reading
Official implementation of RAVEn (ICLR 2023) and BRAVEn (ICASSP 2024)