Hey guys, Let me introdce LipSyncInsight, an lip reading software which recognizes the lips of the user and predicts the output.
The dataset used for training the model is a subset of the Grid Corpus Dataset . Used gdown to download a subset (1 speaker) of the full dataset (34 speakers) from google drive.
- Python-Tensorflow-Keras -> data preparation, pipeline, model training & testing.
- Streamlit -> web application.
- LipNet -> lip reading model architecture idea.
- ffmpeg -> video file format conversion
- opencv -> video capture and frames processing.
- gdown -> for downloading the dataset.
- imageio -> for making gifs
here instead of Bi-GRU we are using Bi-Lstm. corelation matrix
- https://keras.io/examples/audio/ctc_asr/
- https://github.com/rizkiarm/LipNet
- Lip reading.pdf
- Lip reading1.pdf