[question] Reproduce the result of PPO on RoboschoolHumanoidFlagrunHarder #179

doviettung96 · 2019-01-30T09:04:07Z

Hi @araffin ,
Current I am trying to reproduce the result of PPO paper with the environment RoboschoolHumanoidFlagrunHarder.
As I have tried almost every settings, there is still a big gap between mine and their.
I have just modified the code to make logstd=LinearAnneal(-0.7, -1.6) as in the paper.
As I printed the logstd in the distribution.py file, I got:

<tf.Variable 'model/pi/logstd:0' shape=(1, 17) dtype=float32_ref>
However, as I tried to add the following code to the end of PPO2 constructor:
with tf.variable_scope("model", reuse=True):
    self.logstd = tf.get_variable(name='pi/logstd:0')

I got this:

ValueError: Variable model/pi/logstd:0 does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=tf.AUTO_REUSE in VarScope?

I had also just use the variable name "pi/logstd" but it was still failed.
How could I change the value of logstd during training?
Thanks.

The text was updated successfully, but these errors were encountered:

araffin · 2019-02-02T12:57:13Z

Hello,

I had also just use the variable name "pi/logstd" but it was still failed.

I think the variable you are looking for is created here when it is called from the policy
I would check what is the scope of that variable (probably model/... and not directly pi/)

doviettung96 · 2019-02-03T00:24:18Z

Yeah.. That variable is created when the policy is created. As in PPO2, the first call is in the construction of step_model. Thus, the variable scope is "model". Please let me know when you could get that variable.
Thanks.

araffin · 2019-02-03T12:45:02Z

This works for me (without the :0):

with tf.variable_scope('model', reuse=True):
    print(tf.get_variable(name='pi/logstd'))

doviettung96 · 2019-02-07T23:33:21Z

I will try that. Thank you.

BruceK4t1qbit · 2019-02-13T21:50:06Z

@doviettung96 Let me know if you're able to train RoboschoolHumanoidFlagrunHarder successfully. I was not able to, even with annealing the logstd.

doviettung96 · 2019-02-14T12:19:31Z

@BruceK4t1qbit ,
How good is your trained agent? Could you provide some statistics like mean reward or the tensorboard graph? I am trying to use the logstd annealing but for now the Roboschool library runs into a problem of building from source.
Anyway, did you try all the settings from the PPO paper?
Thanks.

BruceK4t1qbit · 2019-02-16T03:49:53Z

@doviettung96
I tried to use all the settings from the PPO paper (It was a while ago, I forget the details). Modified the original baselines' code to do this.

I didn't use tensorboard - just looked at the rendering.

I've found the pybullet_env is much easier to install than roboschool...

doviettung96 · 2019-02-17T01:01:10Z

@BruceK4t1qbit ,
I think you might need the mean of episode rewards to have something to compare. For now, I have also changed the code to use all the settings. As suggested, openai baselines or stable baselines are not the original version of the code using in the PPO paper. Therefore, I am not sure if we could reproduce the result. If you could find any improvement, please let me know.
Thanks.

doviettung96 · 2019-02-17T01:33:18Z

@BruceK4t1qbit ,
I just have a test with it and found that logstd annealing is not important. The result is quite far from the paper.

BruceK4t1qbit · 2019-02-21T19:41:59Z

@doviettung96
I recently also tried SAC on it, which seemed to get stuck in the same local optimum...

doviettung96 · 2019-02-24T04:22:26Z

@BruceK4t1qbit ,
Really? My next step is also DDPG, TD3 and SAC. With this news, I don't know if we could train it successfully. Thanks.

doviettung96 · 2019-02-24T04:37:21Z

@BruceK4t1qbit ,
I have not tested the result carefully, but just by now, my result is quite close to the trained agent in the roboschool.
Everything is set as in the paper. The agent is trained with 400M timesteps (as in the roboschool trained agent).
Just try it again.

ernestum · 2019-02-24T17:14:27Z

So it is just a matter of good luck?

doviettung96 · 2019-02-25T06:23:31Z

I don't think so. Changes are necessary to improve the performance for those tasks. Just it is quite difficult to know what should we add up to improve.

ernestum · 2019-02-25T08:19:42Z

So what did you change?

doviettung96 · 2019-02-25T10:10:12Z

@erniejunior ,
It depends on your starting point. If your starting point is the PPO paper, just increase the total timesteps to 400M and use deeper network (hidden layer: 512-256-128 with relu activation for example).

BruceK4t1qbit · 2019-02-26T02:57:30Z

Thanks! I didn't try such a big network

doviettung96 mentioned this issue Jan 30, 2019

A2C Performance with continuous actions #109

Closed

araffin added the question Further information is requested label Jan 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[question] Reproduce the result of PPO on RoboschoolHumanoidFlagrunHarder #179

[question] Reproduce the result of PPO on RoboschoolHumanoidFlagrunHarder #179

doviettung96 commented Jan 30, 2019 •

edited by araffin

Loading

araffin commented Feb 2, 2019 •

edited

Loading

doviettung96 commented Feb 3, 2019

araffin commented Feb 3, 2019

doviettung96 commented Feb 7, 2019

BruceK4t1qbit commented Feb 13, 2019

doviettung96 commented Feb 14, 2019

BruceK4t1qbit commented Feb 16, 2019

doviettung96 commented Feb 17, 2019

doviettung96 commented Feb 17, 2019

BruceK4t1qbit commented Feb 21, 2019

doviettung96 commented Feb 24, 2019

doviettung96 commented Feb 24, 2019

ernestum commented Feb 24, 2019

doviettung96 commented Feb 25, 2019

ernestum commented Feb 25, 2019

doviettung96 commented Feb 25, 2019

BruceK4t1qbit commented Feb 26, 2019

[question] Reproduce the result of PPO on RoboschoolHumanoidFlagrunHarder #179

[question] Reproduce the result of PPO on RoboschoolHumanoidFlagrunHarder #179

Comments

doviettung96 commented Jan 30, 2019 • edited by araffin Loading

araffin commented Feb 2, 2019 • edited Loading

doviettung96 commented Feb 3, 2019

araffin commented Feb 3, 2019

doviettung96 commented Feb 7, 2019

BruceK4t1qbit commented Feb 13, 2019

doviettung96 commented Feb 14, 2019

BruceK4t1qbit commented Feb 16, 2019

doviettung96 commented Feb 17, 2019

doviettung96 commented Feb 17, 2019

BruceK4t1qbit commented Feb 21, 2019

doviettung96 commented Feb 24, 2019

doviettung96 commented Feb 24, 2019

ernestum commented Feb 24, 2019

doviettung96 commented Feb 25, 2019

ernestum commented Feb 25, 2019

doviettung96 commented Feb 25, 2019

BruceK4t1qbit commented Feb 26, 2019

doviettung96 commented Jan 30, 2019 •

edited by araffin

Loading

araffin commented Feb 2, 2019 •

edited

Loading