Skip to content

A fast neural network for computing mouth shapes from WAV files

License

Notifications You must be signed in to change notification settings

cryptowooser/lipsynch-nn

Repository files navigation

Lip Sync - Neural Network Rhubarb Replication

This is a Python neural network replication of Rhubarb Lip Sync, designed to enable complex lip movement for real-time chatbots.

This project utilizes a simple neural network trained on pairs of spoken texts and Rhubarb Lip Sync outputs, approximating lip movements with around 75% accuracy. This level of accuracy is sufficient for generating a sense of realism in most applications.

Please note, if your application does not require real-time performance, you might want to consider using Rhubarb Lip Sync directly.

How to Use

Inference

For now, only inference is supported, as the training code is being heavily refactored. If for some reason you need to train your own model and can't wait, let me know. For inference, use the following command:

python .\inference.py --wav_file_name .\001.wav --model_name model_full_dataset_2layers.pth  

Training

If you wish to train your own model, you can do so as well. The program looks for 41khz WAV files in the "wavs" directory, and texts generated by Rhubarb (The command-line program) in the "texts" directory. WAVs and TXTs should share the same filename ("001.wav" and "001.txt"). I had better luck not using the extended mouthshapes (except for 'X') so the training program is set to not include them, if you wish to do when training please set the OUTPUT_SIZE variable to 9. If you decide to use the extended mouthshape "X", please find/replace it with "G", or "I" if using the extended mouthshapes.

To-Do List

  1. Convert from using .pth to using SafeTensors.
  2. Add video example to README.md.

Current Status

The code is currently undergoing refactoring and users may encounter errors, particularly when attempting to train their own models. However, the provided model (model_full_dataset_2layers.pth) should be satisfactory for most purposes. It's been trained on over 80 GB of WAV files from a variety of sources, providing a comprehensive and versatile foundation for lip-syncing tasks.

License and Use

This code is available under the MIT license and is free for anyone to use without obligation. However, I would be delighted if you'd drop me a line to let me know if and how you're using it!

Contributions and Feedback

Please feel free to contribute to this project or provide feedback by opening an issue or pull request on GitHub. Your insights are greatly appreciated!

About

A fast neural network for computing mouth shapes from WAV files

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages