Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rllib] add augmented random search #2714

Merged
merged 25 commits into from
Aug 25, 2018
Merged

Conversation

eugenevinitsky
Copy link
Contributor

What do these changes do?

Augmented random search is added as an algorithm: https://arxiv.org/pdf/1803.07055.pdf
Std deviation of meanstdfilter is initialized to 1

Related issue number

@robertnishihara robertnishihara changed the title Ars pull [rllib] add augmented random search Aug 22, 2018
@ericl ericl self-assigned this Aug 22, 2018
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7675/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7674/
Test PASSed.

Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good, some minor comments.

@@ -0,0 +1,360 @@
# Code in this file is copied and adapted from
# https://github.com/openai/evolution-strategies-starter and from
# https://github.com/modestyachts/ARS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, good catch, thanks!


tlogger.record_tabular("TimeElapsedThisIter", step_tend - step_tstart)
tlogger.record_tabular("TimeElapsed", step_tend - self.tstart)
tlogger.dump_tabular()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the tlog stuff still? (I know it's in ES, but it seems unnecessary)

Copy link
Contributor Author

@eugenevinitsky eugenevinitsky Aug 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find some of it useful, but I definitely will remove duplicates. Seeing the std deviation of the weights and the grad norm of the update are useful checks that things are going as planned for example.

"timesteps_this_iter": noisy_lengths.sum(),
"timesteps_so_far": self.timesteps_so_far,
"time_elapsed_this_iter": step_tend - step_tstart,
"time_elapsed": step_tend - self.tstart
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can drop all but the first 3 stats, since they're already calculated by rllib right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put this in the info field of the result instead?

DISABLED = 50


class TbWriter(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if we could not add this file.

@@ -0,0 +1,17 @@
# can expect improvement to -140 reward in ~300-500k timesteps
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Could we also add an entry for ARS in test_supported_spaces.py?

@ericl
Copy link
Contributor

ericl commented Aug 23, 2018

Lint failed: you should run scripts/yapf.sh to fix.

@eugenevinitsky
Copy link
Contributor Author

eugenevinitsky commented Aug 24, 2018

I get: ./yapf.sh: line 51: mapfile: command not found
do you know what this might refer to?

I did run flake8 to check; that's not sufficient?

@ericl
Copy link
Contributor

ericl commented Aug 24, 2018

@eugenevinitsky that seems to be a compatibility problem with the script, this should fix it: #2735

otherwise, I can run it

@eugenevinitsky
Copy link
Contributor Author

eugenevinitsky commented Aug 24, 2018

Addressed the other comments, still blocked on the yapf thing;
pulling in the fix leads to:
From https://github.com/ray-project/ray

  • branch master -> FETCH_HEAD
    xargs: yapf: No such file or directory

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7734/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7735/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7739/
Test FAILed.

Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Did you include the filter change though?

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7740/
Test FAILed.

@eugenevinitsky
Copy link
Contributor Author

Yeah, it's changing np.zeros to np.ones in utils.filter

@@ -62,7 +62,7 @@ class RunningStat(object):
def __init__(self, shape=None):
self._n = 0
self._M = np.zeros(shape)
self._S = np.zeros(shape)
self._S = np.ones(shape)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The relevant filter change, I think.

@eugenevinitsky
Copy link
Contributor Author

Thanks for those fixes btw

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7741/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7743/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7744/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7745/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7750/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7749/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7752/
Test PASSed.

@ericl
Copy link
Contributor

ericl commented Aug 25, 2018

Going to merge this sans the filter change, since that breaks some of the filter unit tests.

@ericl ericl merged commit 6201a6d into ray-project:master Aug 25, 2018
@akaniklaus
Copy link

Thanks a lot for this. Is there any particular reason why this is not utilizing GPU for computation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants