Guide for using LSTM with PPO2 #231

pulver22 · 2019-03-11T10:56:00Z

Hi,

I'm trying to learn navigation policies in a 3D environment while using LSTM as policy for PPO2. I have problem to figure it out the parameters to use.

I usually have an episode that last 200 steps, and I used n_steps=800 in PPO2+CNN in order to have a quite stable learning. The input of the network was a stack of 4 images.

I was wondering how should I change this value while using LSTM. I notices that using 800 means feeding a batch of 800 images to the network and this is a quite long sequence.

Is anyone can give me a suggestion based on your experience?

The text was updated successfully, but these errors were encountered:

araffin · 2019-03-23T19:31:24Z

Hello,
Maybe @erniejunior can help you ?
Anyway, n_steps=800 is huge when using recurrent policy, especially of you are using stacked images.
I would recommend you not to stack images, as this trick is made for feedforward network normally (to provide some time information), recurrent policies have a memory that should be able to replace that trick.
Also, try using less steps, otherwise your training will be slow (and will require a lot of RAM).

pulver22 · 2019-03-25T10:30:39Z

Yes, I was using single channel (only 1 image) with the LSTM.
The proble I had with shorter steps is that I noticed the network never learnt how to achieve the task because PPO is synchronised too fast without significant transitions. A solution would maybe be having initially short episodes with the agent pretty close to the target in order to learn basic navigation primitives, and then using curriculum to increase the size of the episodes and the distance from the target. But this is just an idea.

ernestum · 2019-03-25T13:29:33Z

Sorry, tried and abandoned LSTMs because of lack of success ...
The large number of steps means that you will have a huge tensorflow graph because in the current implementation the LSTM is unrolled manually. So maybe a implementation based on Keras is at least more computationally efficient (see for a non-working draft to get an idea #161).

araffin added the question Further information is requested label Mar 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guide for using LSTM with PPO2 #231

Guide for using LSTM with PPO2 #231

pulver22 commented Mar 11, 2019

araffin commented Mar 23, 2019

pulver22 commented Mar 25, 2019

ernestum commented Mar 25, 2019

Guide for using LSTM with PPO2 #231

Guide for using LSTM with PPO2 #231

Comments

pulver22 commented Mar 11, 2019

araffin commented Mar 23, 2019

pulver22 commented Mar 25, 2019

ernestum commented Mar 25, 2019