-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rllib] add augmented random search #2714
Conversation
Test PASSed. |
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good, some minor comments.
@@ -0,0 +1,360 @@ | |||
# Code in this file is copied and adapted from | |||
# https://github.com/openai/evolution-strategies-starter and from | |||
# https://github.com/modestyachts/ARS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you append https://github.com/modestyachts/ARS/blob/master/LICENSE to ray/LICENSE
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh, good catch, thanks!
python/ray/rllib/agents/ars/ars.py
Outdated
|
||
tlogger.record_tabular("TimeElapsedThisIter", step_tend - step_tstart) | ||
tlogger.record_tabular("TimeElapsed", step_tend - self.tstart) | ||
tlogger.dump_tabular() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the tlog stuff still? (I know it's in ES, but it seems unnecessary)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find some of it useful, but I definitely will remove duplicates. Seeing the std deviation of the weights and the grad norm of the update are useful checks that things are going as planned for example.
python/ray/rllib/agents/ars/ars.py
Outdated
"timesteps_this_iter": noisy_lengths.sum(), | ||
"timesteps_so_far": self.timesteps_so_far, | ||
"time_elapsed_this_iter": step_tend - step_tstart, | ||
"time_elapsed": step_tend - self.tstart |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can drop all but the first 3 stats, since they're already calculated by rllib right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put this in the info field of the result instead?
DISABLED = 50 | ||
|
||
|
||
class TbWriter(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if we could not add this file.
@@ -0,0 +1,17 @@ | |||
# can expect improvement to -140 reward in ~300-500k timesteps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Could we also add an entry for ARS in test_supported_spaces.py
?
Lint failed: you should run |
I get: ./yapf.sh: line 51: mapfile: command not found I did run flake8 to check; that's not sufficient? |
@eugenevinitsky that seems to be a compatibility problem with the script, this should fix it: #2735 otherwise, I can run it |
Addressed the other comments, still blocked on the yapf thing;
|
Test FAILed. |
Test FAILed. |
Test FAILed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Did you include the filter change though?
Test FAILed. |
Yeah, it's changing np.zeros to np.ones in utils.filter |
python/ray/rllib/utils/filter.py
Outdated
@@ -62,7 +62,7 @@ class RunningStat(object): | |||
def __init__(self, shape=None): | |||
self._n = 0 | |||
self._M = np.zeros(shape) | |||
self._S = np.zeros(shape) | |||
self._S = np.ones(shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The relevant filter change, I think.
Thanks for those fixes btw |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test PASSed. |
Test PASSed. |
Test PASSed. |
Going to merge this sans the filter change, since that breaks some of the filter unit tests. |
Thanks a lot for this. Is there any particular reason why this is not utilizing GPU for computation? |
What do these changes do?
Augmented random search is added as an algorithm: https://arxiv.org/pdf/1803.07055.pdf
Std deviation of meanstdfilter is initialized to 1
Related issue number