Lists (3)
Sort Last updated
Starred repositories
Production repository for the all-new Advantage360 Professional using ZMK engine
程序员延寿指南 | A programmer's guide to live longer
HIP: C++ Heterogeneous-Compute Interface for Portability
A modular graph-based Retrieval-Augmented Generation (RAG) system
Superfast AI decision making and intelligent processing of multi-modal data.
The official Python library for the OpenAI API
the AI-native open-source embedding database
Fast inference from large lauguage models via speculative decoding
本项目基于SadTalkers实现视频唇形合成的Wav2lip。通过以视频文件方式进行语音驱动生成唇形,设置面部区域可配置的增强方式进行合成唇形(人脸)区域画面增强,提高生成唇形的清晰度。使用DAIN 插帧的DL算法对生成视频进行补帧,补充帧间合成唇形的动作过渡,使合成的唇形更为流畅、真实以及自然。
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
Real-time face swap for PC streaming or video calls
[NeurIPS 2022] Towards Robust Blind Face Restoration with Codebook Lookup Transformer
Windows Precision Touchpad Driver Implementation for Apple MacBook / Magic Trackpad
GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Keep constant eye contact in zoom conferences
MusePose: a Pose-Driven Image-to-Video Framework for Virtual Human Generation
MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising
Using modified BiSeNet for face parsing in PyTorch
Transformer: PyTorch Implementation of "Attention Is All You Need"
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
CVPR2023 talking face implementation for Identity-Preserving Talking Face Generation With Landmark and Appearance Priors
GeneFace++: Generalized and Stable Real-Time 3D Talking Face Generation; Official Code
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting