This repository contains the simulation architecture based in Gazebo environment for implementing reinforcement learning algorithm, DDPG for generating bipedal walking patterns for the robot.
The autonomous walking of the bipedal walking robot is achieved using reinforcement learning algorithm called Deep Deterministic Policy Gradient(DDPG)1. DDPG utilises the actor-critic learning framework for learning controls in continuous action spaces.
The project details & the results of the experiment have been documented in the research manuscript, Bipedal walking robot using Deep Deterministic Policy Gradient
This project was developed at the Computational Intelligence Laboratory, IISc, Bangalore.
-
walker_gazebo contains the robot model(both .stl files & .urdf file) and also the gazebo launch file.
-
walker_controller contains the reinforcement learning implementation of DDPG algorithm for control of the bipedal walking robot.
Note: A stable bipedal walking was acheived after training the model using a Nvidia GeForce GTX 1050 Ti GPU enabled system for over 41 hours. The visualization for the horizontal boom(attached to the waist) is turned off.
- Lillicrap, Timothy P., et al. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).
- Silver, David, et al. Deterministic Policy Gradient Algorithms. ICML (2014).
Arun Kumar ([email protected]) & Dr. S N Omkar ([email protected])
Implement state of the art RL algorithms(TRPO & PPO) for the same. Hopefully lead to faster training and less convergence time.