Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guide for using LSTM with PPO2 #231

Open
pulver22 opened this issue Mar 11, 2019 · 3 comments
Open

Guide for using LSTM with PPO2 #231

pulver22 opened this issue Mar 11, 2019 · 3 comments
Labels
question Further information is requested

Comments

@pulver22
Copy link

Hi,

I'm trying to learn navigation policies in a 3D environment while using LSTM as policy for PPO2. I have problem to figure it out the parameters to use.

I usually have an episode that last 200 steps, and I used n_steps=800 in PPO2+CNN in order to have a quite stable learning. The input of the network was a stack of 4 images.

I was wondering how should I change this value while using LSTM. I notices that using 800 means feeding a batch of 800 images to the network and this is a quite long sequence.

Is anyone can give me a suggestion based on your experience?

@araffin
Copy link
Collaborator

araffin commented Mar 23, 2019

Hello,
Maybe @erniejunior can help you ?
Anyway, n_steps=800 is huge when using recurrent policy, especially of you are using stacked images.
I would recommend you not to stack images, as this trick is made for feedforward network normally (to provide some time information), recurrent policies have a memory that should be able to replace that trick.
Also, try using less steps, otherwise your training will be slow (and will require a lot of RAM).

@araffin araffin added the question Further information is requested label Mar 23, 2019
@pulver22
Copy link
Author

Yes, I was using single channel (only 1 image) with the LSTM.
The proble I had with shorter steps is that I noticed the network never learnt how to achieve the task because PPO is synchronised too fast without significant transitions. A solution would maybe be having initially short episodes with the agent pretty close to the target in order to learn basic navigation primitives, and then using curriculum to increase the size of the episodes and the distance from the target. But this is just an idea.

@ernestum
Copy link
Collaborator

Sorry, tried and abandoned LSTMs because of lack of success ...
The large number of steps means that you will have a huge tensorflow graph because in the current implementation the LSTM is unrolled manually. So maybe a implementation based on Keras is at least more computationally efficient (see for a non-working draft to get an idea #161).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants