Skip to content

LSTM network trained on dance videos using audio( songs ) as input and human pose estimated coordinates as output. Trained LSTM models are then used to generate dance videos using songs as input.

License

Notifications You must be signed in to change notification settings

keshavoct98/DANCING-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

84 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DANCING-AI

  1. Extraction of pose coordinates from dance videos using openpose human pose estimation.
  2. Training LSTM network on extracted coordinates using songs as input and coordinates as output.
  3. Trained lstm is used to predict dance coordinates for the remaining song( 95% of the audio is used for training and remaining 5% for predictions ).
  4. Display output videos by joining predicted coordinates to generate dancing human stick figures.

Requirements

   opencv-contrib-python==4.7.0.72
   pandas==2.0.1
   librosa==0.10.0.post2
   moviepy==1.0.3
   yt-dlp==2023.3.4
   tensorflow==2.12.0
   keras==2.12.0

Training/Demo Open In Colab

  1. Run get_data.py to download videos and audios to data folder. You can add youtube videos links to "video_links.txt" file for downloading. Alternatively you can copy videos( '.mp4' format ) and audios( '.wav' format ) directly to the data folder.
  2. Download pretrained weights for pose estimation from here. Download pose_iter_440000.caffemodel and save it in "models" folder.
  3. Run main.py to train lstm and display predicted dance video.
 python main.py --video "path to input video" --audio "path to input audio" --background "path to background image" --display
 Example - python main.py --video data/0.mp4 --audio data/0.wav --background inputs/bg0.jpg --display

   #Note - If the gpu-ram is 3 GB or less, Reduce memory-limit in this line to a value less than your gpu-ram.

Pose estimation using openpose

Predictions

References

  1. https://www.learnopencv.com/deep-learning-based-human-pose-estimation-using-opencv-cpp-python/
  2. https://github.com/CMU-Perceptual-Computing-Lab/openpose
  3. https://python-pytube.readthedocs.io/en/latest/
  4. https://zulko.github.io/moviepy/
  5. https://librosa.org/librosa/
  6. https://www.youtube.com/channel/UCX9y7I0jT4Q5pwYvNrcHI_Q

About

LSTM network trained on dance videos using audio( songs ) as input and human pose estimated coordinates as output. Trained LSTM models are then used to generate dance videos using songs as input.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages