Test project to validate if tch-rs and Rust are viable to be used for the full training side of reinforcement learning (though specifically Proximal Policy Optimization). Using a modified version of a vectorized environment from a personal AI bot project (currently private) with the gym rlgym-sim-rs. Minor inspiration of structural layout from rlgym-ppo. It has fully asynchronous workers that can be distributed if desired but is limited to one learner instance.
Currently it is setup to use Redis as a means to communicate between worker(s) and learner. There is the ability to change to another networking backend if desired by just reimplementing the RolloutDatabaseBackend trait. It also uses a config file to allow for rapid reconfiguration if necessary.
See the bin folder for the worker and learner examples.
This will likely not be a strongly documented project as it does not appear to serve much of a purpose right now. If things change in the future, this still might simply serve as some place to start from or reference for us. There is a minor attempt to write some documentation in the code but it is not guaranteed to be easily readable.
At this time, tch-rs does not appear to support saving the optimizer state which will hurt agent performance if starting from checkpoints often. Additionally, it does not appear to be quite as performant overall as our optimized PyTorch/Python code during training. Because of these limitations, this will likely be put on hold to be a more feature rich reinforcement learning framework.