-
Notifications
You must be signed in to change notification settings - Fork 723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training LSTMs involves lots of data transformation #158
Comments
My first thought was that the runner should keep the data untouched and we should feed it to the policy in the format [num_steps, num_envs, x]:
What do you think? |
Yes, I completely agree that LSTM code is overcomplicated (and that is also the reason I avoid using recurrent policies for now ^^"...). |
Referencing that PR here: openai#859 |
I looked at how exactly LSTMs are trained with PPO2 and found that a lot of unnecessary data transformations happen:
[num_steps, num_envs, x]
to[num_steps * num_envs, x]
after switching the first two dimensions.All this seems to be overly complex and potentially slow to me. This is why I would like to open the discussion here on how matters could be improved. Please set your ideas free :-)
The text was updated successfully, but these errors were encountered: