Unable to reproduce result on HalfCheetah-v2 #75

quanvuong · 2019-04-17T19:15:16Z

I am unable to obtain the result as reported in the paper on the openai environment HalfCheetah-v2. The commit used to obtain this result is 1f6147c, which isn't too long ago. The result is averaged over 5 random initial seeds.

Do you know what might be causing this issue? Thank you!

I am able to obtain the result as reported (or close to it) in the paper on the remaining environments, posted here for reference.

The text was updated successfully, but these errors were encountered:

hartikainen · 2019-04-17T20:27:58Z

Thanks a lot for reporting this! This is not expected, and it's not immediately obvious to me what might cause it. Just to make sure: are all these figures produced with the default values in the variants.py? How many seeds are in each figure?

quanvuong · 2019-04-17T20:31:25Z

yup, the figures are produced with the default values in variants.py.

The results in each figure are averaged over 5 random initial seeds.

hartikainen · 2019-04-17T23:11:55Z

Thanks! I'll try to look into this soon.

hartikainen · 2019-05-04T21:07:18Z

Hey, I just ran HalfCheetah from the latest master, and unfortunately cannot reproduce the problems you're seeing. Here are the results I see across 6 seeds:

The results are a tiny bit worse than what we report in our paper [1], maybe because of the upgrade from mujoco 1.5 to 2.0.

Maybe there's something different in your environment? Could you post the output of pip freeze? Also, which mujoco version are you using?

[1] https://arxiv.org/pdf/1812.05905.pdf

quanvuong · 2019-05-08T08:33:08Z

Thank you for your comment!

I will rerun master and let you know how it goes here.

quanvuong · 2019-05-11T13:22:34Z

Thank you for your effort in maintaining the repo. It has been super useful and illuminating to read the code. Also, thank you for rerunning the master and helps me get to the bottom of my issue. Please find my results below:

Running the master on 5th May, which corresponds to commit 1f6686d does not reproduce the reported results. This is because in my runs, 1 out of 5 is significantly worse than the other runs. The performance graph is shown below.

However, you might not be able to reproduce this result by setting the initial seeds to the values in my runs. This is because of this issue #80.

After I also set the seed of the environment and fix the value for the initial seeds in the runs, I obtained similar results where some of the runs are significantly worse than others.

The output of pip freeze can be found in the file below

output.txt

I am using mujoco 1.50 for Linux.

hartikainen · 2019-05-11T17:58:08Z

Thanks for the detailed response! This still seems a little odd since I'm not able to get any seeds failing like that.

One difference I see in your pip freeze output is the mujoco-py version. You have mujoco-py==1.50.1.68 whereas we recently upgraded to mujoco-py==2.0.2.0. Could you try running pip install -U gym mujoco-py and installing mujoco 2.0, and see if that solves the issue?

Nicolinho · 2019-05-18T09:28:59Z

I also observed very similar results as @quanvuong did (including runs that converged at ~2k return). I used mujoco 2.0 and mujoco-py 2.0.2.0.

hartikainen · 2019-05-18T17:35:03Z

@Nicolinho would you mind pasting the output of your pip freeze and conda list here?

Nicolinho · 2019-05-19T07:51:07Z

@hartikainen
pip_freeze.txt
conda_list.txt

hartikainen · 2019-05-19T19:03:17Z

Hey @quanvuong and @Nicolinho, thanks a lot for providing the information. I spent a bit of time debugging this today. I couldn't find anything obvious that could be wrong, but was eventually able to reproduce this issue on one of my machines where 2/4 seeds failed. The weird thing is that still on other machines, the cheetah runs completely fine; I ran a total of 30 seeds on 3 different machines and none of them had the issue. 10 of those seeds are found from a comment in the latest PR: #85 (comment).

I'll try to dig deeper soon and let you know if I find anything.

hartikainen · 2019-05-20T02:54:12Z

Looks like the policy for some reason flips around in the beginning and thus gets stuck to a local minimum:

quanvuong · 2019-05-20T17:52:51Z

It was so satisfying to finally know why this happened. Thank you for the investigative work!

szrlee · 2019-06-07T13:50:31Z

Hey, I just ran HalfCheetah from the latest master, and unfortunately cannot reproduce the problems you're seeing. Here are the results I see across 6 seeds:

The results are a tiny bit worse than what we report in our paper [1], maybe because of the upgrade from mujoco 1.5 to 2.0.

Maybe there's something different in your environment? Could you post the output of pip freeze? Also, which mujoco version are you using?

[1] https://arxiv.org/pdf/1812.05905.pdf

Hi @hartikainen , for the curve of one single trial, do you use smoothed curve?
My reproduction is not very stable. This is plotted by tensorboard with smooth=0.6. Each curve is from one random seed.

Thanks!

hartikainen · 2019-06-07T18:39:14Z

@szrlee yeah, my figures were smoothed with viskit's default smoothing.

amiranas · 2019-06-24T20:02:31Z

I think the problem is that Gym has two separate random seeds for an environment. One is set during the creation of the environment. The other is the seed of env.action_space. The second seed is responsible for the random actions of env.action_space.sample(). For some reason the action space seed is different from the main seed. Moreover, in many gym versions the action space seed is always set to 0. I'm not certain what the current gym version does, but just to be sure I always set the action space seeds myself.

I run some SAC experiments with a current gym version and first set the action space seed to zero by adding the following line to the GymAdapter class:

env = gym.envs.make(env_id, **kwargs)
env.action_space.seed(0)

The result after 1 mil timesteps was similar to those shown above (for HalfCheetah-v2):

Then I rerun the experiment with a random action space seed:

env = gym.envs.make(env_id, **kwargs)
env.action_space.seed(int.from_bytes(os.urandom(4), byteorder='big'))

It made 3 of the 9 runs get stuck at the back-flip policy:

This problem seems to be gone when I increase the amount of random steps in the beginning of training (n_initial_exploration_steps) to 10000 steps. The result for 9 runs with random action space seeds was:

Gym commit: 98917cf
SAC commit: b4db23a
Mujoco py: a1b6e31
Mujoco 2.0

xanderdunn · 2021-02-23T16:14:07Z

I was able to readily reproduce the paper's results on HalfCheetah-v2 without changing any defaults:

All four seeds achieve >15,000 evaluation mean reward on the first 3M timesteps. 10.7 hours of simultaneous compute on 2x A100s. These seed values were also set on env.action_space.seed() and env.seed() as @amiranas described.

MuJoCo 2.0
softlearning 46f1443

hartikainen · 2021-02-23T16:43:05Z

Nice, thanks a ton for reporting these @xanderdunn! Seems like this issue can be closed now.

…il-berkeley/softlearning#75

* Updated to match new priority methods in CARES RL * latest for using PER buffer with new CARES RL * fixed image_wrapper to have sample function * fixed seed setting for openai gym * Updated seed for action space to (0) based on similar issues here: rail-berkeley/softlearning#75 * revert seed 0

xanderdunn mentioned this issue Feb 23, 2021

Difficulty Reproducing HalfCheetah-v2 SAC Results rail-berkeley/rlkit#128

Open

hartikainen closed this as completed Feb 23, 2021

beardyFace added a commit to UoA-CARES/gymnasium_envrionments that referenced this issue Apr 4, 2024

Updated seed for action space to (0) based on similar issues here: ra…

d6036f7

…il-berkeley/softlearning#75

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to reproduce result on HalfCheetah-v2 #75

Unable to reproduce result on HalfCheetah-v2 #75

quanvuong commented Apr 17, 2019

hartikainen commented Apr 17, 2019

quanvuong commented Apr 17, 2019

hartikainen commented Apr 17, 2019

hartikainen commented May 4, 2019

quanvuong commented May 8, 2019

quanvuong commented May 11, 2019

hartikainen commented May 11, 2019

Nicolinho commented May 18, 2019

hartikainen commented May 18, 2019

Nicolinho commented May 19, 2019

hartikainen commented May 19, 2019 •

edited

Loading

hartikainen commented May 20, 2019

quanvuong commented May 20, 2019

szrlee commented Jun 7, 2019

hartikainen commented Jun 7, 2019

amiranas commented Jun 24, 2019 •

edited

Loading

xanderdunn commented Feb 23, 2021

hartikainen commented Feb 23, 2021

Unable to reproduce result on HalfCheetah-v2 #75

Unable to reproduce result on HalfCheetah-v2 #75

Comments

quanvuong commented Apr 17, 2019

hartikainen commented Apr 17, 2019

quanvuong commented Apr 17, 2019

hartikainen commented Apr 17, 2019

hartikainen commented May 4, 2019

quanvuong commented May 8, 2019

quanvuong commented May 11, 2019

hartikainen commented May 11, 2019

Nicolinho commented May 18, 2019

hartikainen commented May 18, 2019

Nicolinho commented May 19, 2019

hartikainen commented May 19, 2019 • edited Loading

hartikainen commented May 20, 2019

quanvuong commented May 20, 2019

szrlee commented Jun 7, 2019

hartikainen commented Jun 7, 2019

amiranas commented Jun 24, 2019 • edited Loading

xanderdunn commented Feb 23, 2021

hartikainen commented Feb 23, 2021

hartikainen commented May 19, 2019 •

edited

Loading

amiranas commented Jun 24, 2019 •

edited

Loading