Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to reproduce result on HalfCheetah-v2 #75

Closed
quanvuong opened this issue Apr 17, 2019 · 18 comments
Closed

Unable to reproduce result on HalfCheetah-v2 #75

quanvuong opened this issue Apr 17, 2019 · 18 comments

Comments

@quanvuong
Copy link

I am unable to obtain the result as reported in the paper on the openai environment HalfCheetah-v2. The commit used to obtain this result is 1f6147c, which isn't too long ago. The result is averaged over 5 random initial seeds.

halfcheetah

Do you know what might be causing this issue? Thank you!

I am able to obtain the result as reported (or close to it) in the paper on the remaining environments, posted here for reference.

ant
walker
humanoid
hopper

@hartikainen
Copy link
Member

Thanks a lot for reporting this! This is not expected, and it's not immediately obvious to me what might cause it. Just to make sure: are all these figures produced with the default values in the variants.py? How many seeds are in each figure?

@quanvuong
Copy link
Author

yup, the figures are produced with the default values in variants.py.

The results in each figure are averaged over 5 random initial seeds.

@hartikainen
Copy link
Member

Thanks! I'll try to look into this soon.

@hartikainen
Copy link
Member

Hey, I just ran HalfCheetah from the latest master, and unfortunately cannot reproduce the problems you're seeing. Here are the results I see across 6 seeds:
newplot (77)
newplot (78)

The results are a tiny bit worse than what we report in our paper [1], maybe because of the upgrade from mujoco 1.5 to 2.0.

Maybe there's something different in your environment? Could you post the output of pip freeze? Also, which mujoco version are you using?

[1] https://arxiv.org/pdf/1812.05905.pdf

@quanvuong
Copy link
Author

Thank you for your comment!

I will rerun master and let you know how it goes here.

@quanvuong
Copy link
Author

Thank you for your effort in maintaining the repo. It has been super useful and illuminating to read the code. Also, thank you for rerunning the master and helps me get to the bottom of my issue. Please find my results below:

Running the master on 5th May, which corresponds to commit 1f6686d does not reproduce the reported results. This is because in my runs, 1 out of 5 is significantly worse than the other runs. The performance graph is shown below.

HalfCheetah_evaluation_return-average_long_title_True_plot_individual

However, you might not be able to reproduce this result by setting the initial seeds to the values in my runs. This is because of this issue #80.

After I also set the seed of the environment and fix the value for the initial seeds in the runs, I obtained similar results where some of the runs are significantly worse than others.

HalfCheetah_training_return-average_long_title_True_plot_individual

HalfCheetah_evaluation_return-average_long_title_True_plot_individual

The output of pip freeze can be found in the file below

output.txt

I am using mujoco 1.50 for Linux.

@hartikainen
Copy link
Member

Thanks for the detailed response! This still seems a little odd since I'm not able to get any seeds failing like that.

One difference I see in your pip freeze output is the mujoco-py version. You have mujoco-py==1.50.1.68 whereas we recently upgraded to mujoco-py==2.0.2.0. Could you try running pip install -U gym mujoco-py and installing mujoco 2.0, and see if that solves the issue?

@Nicolinho
Copy link

I also observed very similar results as @quanvuong did (including runs that converged at ~2k return). I used mujoco 2.0 and mujoco-py 2.0.2.0.

@hartikainen
Copy link
Member

@Nicolinho would you mind pasting the output of your pip freeze and conda list here?

@Nicolinho
Copy link

@hartikainen
Copy link
Member

hartikainen commented May 19, 2019

Hey @quanvuong and @Nicolinho, thanks a lot for providing the information. I spent a bit of time debugging this today. I couldn't find anything obvious that could be wrong, but was eventually able to reproduce this issue on one of my machines where 2/4 seeds failed. The weird thing is that still on other machines, the cheetah runs completely fine; I ran a total of 30 seeds on 3 different machines and none of them had the issue. 10 of those seeds are found from a comment in the latest PR: #85 (comment).

I'll try to dig deeper soon and let you know if I find anything.

@hartikainen
Copy link
Member

Looks like the policy for some reason flips around in the beginning and thus gets stuck to a local minimum:
output

@quanvuong
Copy link
Author

It was so satisfying to finally know why this happened. Thank you for the investigative work!

@szrlee
Copy link

szrlee commented Jun 7, 2019

Hey, I just ran HalfCheetah from the latest master, and unfortunately cannot reproduce the problems you're seeing. Here are the results I see across 6 seeds:
newplot (77)
newplot (78)

The results are a tiny bit worse than what we report in our paper [1], maybe because of the upgrade from mujoco 1.5 to 2.0.

Maybe there's something different in your environment? Could you post the output of pip freeze? Also, which mujoco version are you using?

[1] https://arxiv.org/pdf/1812.05905.pdf

Hi @hartikainen , for the curve of one single trial, do you use smoothed curve?
My reproduction is not very stable. This is plotted by tensorboard with smooth=0.6. Each curve is from one random seed.

halfcheetah-v2_5-seeds_ray_tune_evaluation_episode-reward-mean

Thanks!

@hartikainen
Copy link
Member

@szrlee yeah, my figures were smoothed with viskit's default smoothing.

@amiranas
Copy link

amiranas commented Jun 24, 2019

I think the problem is that Gym has two separate random seeds for an environment. One is set during the creation of the environment. The other is the seed of env.action_space. The second seed is responsible for the random actions of env.action_space.sample(). For some reason the action space seed is different from the main seed. Moreover, in many gym versions the action space seed is always set to 0. I'm not certain what the current gym version does, but just to be sure I always set the action space seeds myself.

I run some SAC experiments with a current gym version and first set the action space seed to zero by adding the following line to the GymAdapter class:

env = gym.envs.make(env_id, **kwargs)
env.action_space.seed(0)

The result after 1 mil timesteps was similar to those shown above (for HalfCheetah-v2):

0_seed

Then I rerun the experiment with a random action space seed:

env = gym.envs.make(env_id, **kwargs)
env.action_space.seed(int.from_bytes(os.urandom(4), byteorder='big'))

It made 3 of the 9 runs get stuck at the back-flip policy:

rand_seed

This problem seems to be gone when I increase the amount of random steps in the beginning of training (n_initial_exploration_steps) to 10000 steps. The result for 9 runs with random action space seeds was:

rand_seed_long_init

Gym commit: 98917cf
SAC commit: b4db23a
Mujoco py: a1b6e31
Mujoco 2.0

@xanderdunn
Copy link

I was able to readily reproduce the paper's results on HalfCheetah-v2 without changing any defaults:
Screen Shot 2021-02-23 at 08 06 41
All four seeds achieve >15,000 evaluation mean reward on the first 3M timesteps. 10.7 hours of simultaneous compute on 2x A100s. These seed values were also set on env.action_space.seed() and env.seed() as @amiranas described.

MuJoCo 2.0
softlearning 46f1443

@hartikainen
Copy link
Member

Nice, thanks a ton for reporting these @xanderdunn! Seems like this issue can be closed now.

dvalenciar pushed a commit to UoA-CARES/gymnasium_envrionments that referenced this issue Apr 5, 2024
* Updated to match new priority methods in CARES RL

* latest for using PER buffer with new CARES RL

* fixed image_wrapper to have sample function

* fixed seed setting for openai gym

* Updated seed for action space to (0) based on similar issues here: rail-berkeley/softlearning#75

* revert seed 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants