[rllib] Basic IMPALA implementation (using deepmind's reference vtrace.py) #2504

ericl · 2018-07-29T05:29:33Z

What do these changes do?

Rename AsyncSamplesOptimizer -> AsyncReplayOptimizer
Add AsyncSamplesOptimizer that implements the IMPALA architecture
integrate V-trace with a3c policy graph
audit V-trace integration
benchmark compare vs A3C and with V-trace on/off

PongNoFrameskip-v4 on IMPALA scaling from 16 to 128 workers, solving Pong in <10 min. For reference, solving this env takes ~40 minutes for Ape-X and several hours for A3C.

cc @joneswong

Related issue number

#1924

ericl · 2018-07-29T05:33:15Z

python/ray/rllib/agents/impala/vtrace_policy_graph.py

@@ -0,0 +1,189 @@
+"""This is an variant of A3CPolicyGraph that uses V-trace for loss calc.


Adapted from a3c_policy_graph.py

"an variant" -> "a variant"

ericl · 2018-07-29T05:33:30Z

python/ray/rllib/agents/impala/vtrace.py

@@ -0,0 +1,300 @@
+# Copyright 2018 Google LLC


Code inlined verbatim

ericl · 2018-07-29T05:34:12Z

python/ray/rllib/optimizers/async_replay_optimizer.py

@@ -0,0 +1,295 @@
+"""Implements Distributed Prioritized Experience Replay.


Moved from async_sample_optimizer

AmplabJenkins · 2018-07-29T07:00:02Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6976/
Test FAILed.

AmplabJenkins · 2018-07-29T07:07:31Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6977/
Test PASSed.

ericl · 2018-07-29T07:41:27Z

python/ray/rllib/optimizers/async_samples_optimizer.py

@@ -1,108 +1,28 @@
-"""Implements Distributed Prioritized Experience Replay.
+"""Implements the IMPALA architecture.


Adapted from #2147

AmplabJenkins · 2018-07-29T07:47:46Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6979/
Test FAILed.

AmplabJenkins · 2018-07-29T07:52:27Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6980/
Test FAILed.

AmplabJenkins · 2018-07-29T08:16:24Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6981/
Test FAILed.

AmplabJenkins · 2018-07-31T00:38:18Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7032/
Test FAILed.

AmplabJenkins · 2018-07-31T01:06:50Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7035/
Test PASSed.

AmplabJenkins · 2018-07-31T01:14:40Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7038/
Test FAILed.

AmplabJenkins · 2018-07-31T04:43:35Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7040/
Test FAILed.

richardliaw · 2018-07-31T05:01:03Z

Is the 16 workers A3C? Did you try A3C with vectorized envs too?

ericl · 2018-07-31T05:31:41Z

No, this is all impala. A3C takes hours to get to the same point.

…

On Mon, Jul 30, 2018, 10:01 PM Richard Liaw ***@***.***> wrote: Is the 16 workers A3C? Did you try A3C with vectorized envs too? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#2504 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAA6Sn-QHRFSByhV4UHUxf-jo106qUzMks5uL-SUgaJpZM4VlS7j> .

AmplabJenkins · 2018-07-31T05:37:17Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7041/
Test PASSed.

AmplabJenkins · 2018-07-31T06:18:53Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7042/
Test PASSed.

AmplabJenkins · 2018-07-31T06:22:20Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7043/
Test PASSed.

richardliaw · 2018-08-01T21:01:22Z

python/ray/rllib/agents/impala/vtrace_policy_graph.py

+ tf.float32))
+
+ # The policy gradients loss
+ self.pi_loss = -tf.reduce_sum(


these should be reduce_mean now that concatenation is removed.

The deepmind impl seems to use reduce_sum; we can either keep it or change it together with a3c.

richardliaw · 2018-08-01T21:08:13Z

doc/source/rllib-training.rst

+
+.. code-block:: bash
+
+ python ray/python/ray/rllib/train.py -f /path/to/tuned/example.yaml


rllib train -f ...

I don't think this works if you don't pip install, so going to keep it for now

AmplabJenkins · 2018-08-01T22:09:38Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7113/
Test FAILed.

AmplabJenkins · 2018-08-01T22:30:07Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7112/
Test PASSed.

AmplabJenkins · 2018-08-01T23:37:52Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7115/
Test FAILed.

AmplabJenkins · 2018-08-02T00:51:21Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7119/
Test FAILed.

ericl · 2018-08-02T03:53:40Z

Everything else looks good, merging after yapf.

ericl added 13 commits July 25, 2018 14:39

initial reorg

980875f

fix

2ffaf5f

wip

e2ff2d7

move

f8d4c83

wip

46c07fe

params

f0fcce7

hyperparams

1fd5148

steps

c57490e

prefetch

f9a3fd7

wip

1cf1ddb

runs

ef3f9de

update

7ae6689

remove dead code

7b0cae4

ericl assigned richardliaw and unassigned richardliaw Jul 29, 2018

ericl commented Jul 29, 2018

View reviewed changes

license

d69ee54

ericl force-pushed the impala branch from 4170022 to d69ee54 Compare July 29, 2018 05:41

ericl added 3 commits July 28, 2018 23:14

add stats

dec3649

always emit stats

550f244

a3c stats

0eb4057

ericl force-pushed the impala branch from f14eb2e to 0eb4057 Compare July 29, 2018 06:21

Update vtrace_policy_graph.py

ce3f96b

ericl commented Jul 29, 2018

View reviewed changes

switch to deepmind prep

cc3e6cd

ericl added 2 commits July 30, 2018 21:07

add tuned example

ab4e099

revert prep changes

8152d29

ericl force-pushed the impala branch from 3d37e2e to 8152d29 Compare July 31, 2018 04:13

fix link

45d1673

ericl force-pushed the impala branch from b0ffc99 to 45d1673 Compare July 31, 2018 04:55

comment

48b3372

improve image

986e2fa

richardliaw approved these changes Aug 1, 2018

View reviewed changes

richardliaw reviewed Aug 1, 2018

View reviewed changes

ericl added 3 commits August 1, 2018 14:20

update docs

0afddfe

sum gen

79816bc

Merge remote-tracking branch 'upstream/master' into impala

a865232

clip rewards fix

b361207

yapf

b6dedbe

ericl merged commit 9ea57c2 into ray-project:master Aug 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rllib] Basic IMPALA implementation (using deepmind's reference vtrace.py) #2504

[rllib] Basic IMPALA implementation (using deepmind's reference vtrace.py) #2504

ericl commented Jul 29, 2018 •

edited

Loading

ericl Jul 29, 2018

robertnishihara Jul 29, 2018

ericl Jul 29, 2018

ericl Jul 29, 2018

AmplabJenkins commented Jul 29, 2018

AmplabJenkins commented Jul 29, 2018

ericl Jul 29, 2018

AmplabJenkins commented Jul 29, 2018

AmplabJenkins commented Jul 29, 2018

AmplabJenkins commented Jul 29, 2018

AmplabJenkins commented Jul 31, 2018

AmplabJenkins commented Jul 31, 2018

AmplabJenkins commented Jul 31, 2018

AmplabJenkins commented Jul 31, 2018

richardliaw commented Jul 31, 2018

ericl commented Jul 31, 2018 via email

AmplabJenkins commented Jul 31, 2018

AmplabJenkins commented Jul 31, 2018

AmplabJenkins commented Jul 31, 2018

richardliaw Aug 1, 2018

ericl Aug 1, 2018

richardliaw Aug 1, 2018

ericl Aug 1, 2018

AmplabJenkins commented Aug 1, 2018

AmplabJenkins commented Aug 1, 2018

AmplabJenkins commented Aug 1, 2018

AmplabJenkins commented Aug 2, 2018

ericl commented Aug 2, 2018

		@@ -0,0 +1,189 @@
		"""This is an variant of A3CPolicyGraph that uses V-trace for loss calc.

		@@ -0,0 +1,295 @@
		"""Implements Distributed Prioritized Experience Replay.

		@@ -1,108 +1,28 @@
		"""Implements Distributed Prioritized Experience Replay.
		"""Implements the IMPALA architecture.


		.. code-block:: bash

		python ray/python/ray/rllib/train.py -f /path/to/tuned/example.yaml

[rllib] Basic IMPALA implementation (using deepmind's reference vtrace.py) #2504

[rllib] Basic IMPALA implementation (using deepmind's reference vtrace.py) #2504

Conversation

ericl commented Jul 29, 2018 • edited Loading

What do these changes do?

Related issue number

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AmplabJenkins commented Jul 29, 2018

AmplabJenkins commented Jul 29, 2018

Choose a reason for hiding this comment

AmplabJenkins commented Jul 29, 2018

AmplabJenkins commented Jul 29, 2018

AmplabJenkins commented Jul 29, 2018

AmplabJenkins commented Jul 31, 2018

AmplabJenkins commented Jul 31, 2018

AmplabJenkins commented Jul 31, 2018

AmplabJenkins commented Jul 31, 2018

richardliaw commented Jul 31, 2018

ericl commented Jul 31, 2018 via email

AmplabJenkins commented Jul 31, 2018

AmplabJenkins commented Jul 31, 2018

AmplabJenkins commented Jul 31, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AmplabJenkins commented Aug 1, 2018

AmplabJenkins commented Aug 1, 2018

AmplabJenkins commented Aug 1, 2018

AmplabJenkins commented Aug 2, 2018

ericl commented Aug 2, 2018

ericl commented Jul 29, 2018 •

edited

Loading