Skip to content

50sven/ros_rllib

Repository files navigation

Distributed PPO algorithm with the CARLA Reinforcement Learning Library

This library implements a distributed version of the Proximal Policy Optimization algorithm and uses the carla_rllib environment. In addition, there is an older implementation of the Asynchronous Actor-Critic which was not further developed.

To get started with the CARLA simulator, click here.

To get started with the CARLA reinforcment learning library, click here.

Prerequisites

1. Install NumPy and PyTorch.

2. Follow the instructions here to install ROS melodic.

3. Follow the instructions here to create a catkin workspace.

4. Clone this repository into the src folder of your catkin workspace:

cd catkin_ws/src
git clone https://github.com/50sven/ros_rl.git

5. Build your package with catkin_make:

cd catkin_ws
source devel/setup.bash
catkin_make

Note: Every time you have made changes to the code, the package must be rebuilt.

Get Started (with PPO)

1. In order to start training, take a look at the following files:

  • node_ppo.launch: configurate the parameters of the algorithm and the environment
  • setting.config: configurate the parameters regarding ROS and workstations used

2. Start training with the bash script train.sh:

cd catkin_ws/src/ros_carla_rllib/scripts/
./train.sh

Idea:

  • train.sh starts three types of nodes: MasterNode, EvalNode and EnvNode; as well as CARLA servers. Each node/server runs in its own tmux session.
  • There is only one MasterNode and only one EvalNode per training.
  • In contrast, there can be more than one EnvNode.
  • EnvNodes and the EvalNode run their own carla_rllib environment, which is uniquely assigned to one carla server.
  • There can be multiple trainings/nodes running on the same workstations (requires unique ports).

Note: The current implementation uses the trajectory environment from the carla_rllib which was build for a specific use case. In order to fit this implementation to individual needs one must adapted the ppo implementation and the online evaluation.

Concept (of PPO)

Distributed PPO

Master:

  • stores rollout data received from the environment nodes
  • exectues PPO optimization steps
  • logs training diagnostics

Environment:

  • runs the policy in the environment to collect training data and sends it to the master

Evaluation:

  • runs online evaluations during training
  • logs evaluation metrics

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages