Improve discrete control offline RL benchmark #612

nuance1979 · 2022-04-25T20:13:09Z

I have marked all applicable categories:
- exception-raising bug
- RL algorithm bug
- documentation request (i.e. "X is missing from the documentation.")
- new feature request
I have visited the source website
I have searched through the issue tracker for duplicates

I have mentioned version numbers, operating system and environment, where applicable:

import tianshou, gym, torch, numpy, sys
print(tianshou.__version__, gym.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)

I wonder if anyone is actively working on improving the benchmark on discrete control offline RL policies. As I noted in examples/offline/README.md, we should use a publicly available dataset to benchmark our policies.

Currently the best discrete control offline datasets seem to be the Atari portion of RL Unplugged. I tried to convert them into Tianshou Batch but couldn't find out how to get the done flag.

https://github.com/deepmind/deepmind-research/blob/1642ae3499c8d1135ec6fe620a68911091dd25ef/rl_unplugged/atari.py#L26-L37

If I assume the data points are in order, then I might be able to find the point where the next episode id is not the same as the current episode id and mark it to be episode end. But I don't know if this assumption holds.

@Trinkle23897 What do you think? Have you worked with Reverb data before?

The text was updated successfully, but these errors were encountered:

Trinkle23897 · 2022-04-25T22:55:46Z

but couldn't find out how to get the done flag.

They always treat discount as another term of done. Ref: https://github.com/sail-sg/envpool/blob/5b08389ec0fad903a9fb3288d54f470bc790bdfc/envpool/python/dm_envpool.py#L63

https://github.com/deepmind/deepmind-research/blob/1642ae3499c8d1135ec6fe620a68911091dd25ef/rl_unplugged/atari.py#L227

nuance1979 · 2022-04-27T01:02:36Z

I managed to convert a shard of Pong dataset (Pong/run_1-00000-of-00100) into tianshou.data.ReplayBuffer and saved it to disk in hdf5. However, the size of the hdf5 file is 53GB! As a reference, the size of the original file is 720MB. As I understand it, the original file is a gzipped TFRecord protobuffer. The number of samples is 498549. For RL Unplugged Atari data, observations are already framestacked at 4, i.e., the obs space shape is (84, 84, 4). Note that I'm talking about one file here. There are 5*100=500 similarly-sized files for each Atari game.

I wanted to do this conversion because I hate to keep Tensorflow installed. However, without compression, the file size becomes an issue. Also the conversion is quite slow. It would be great if we could find some cloud storage space for the converted files.

What do you think? @Trinkle23897

Trinkle23897 · 2022-04-27T01:05:24Z

Maybe we can add another way of ReplayBuffer save/restore. I remember the compression algorithm from numpy itself is much more efficient than pickle/hdf5 (according to my experiments at that time).

nuance1979 · 2022-04-27T03:40:36Z

Maybe we can add another way of ReplayBuffer save/restore. I remember the compression algorithm from numpy itself is much more efficient than pickle/hdf5 (according to my experiments at that time).

I tried hdf5 compression first and it worked pretty well: 53GB -> 283MB.

nuance1979 · 2022-04-28T23:54:35Z

I have a script to convert a shard of RL Unplugged dataset into a tianshou.data.ReplayBuffer. Each shard contains about 500k transitions. Now I want to run an experiment with 1M transitions. What is the best way to "merge" two ReplayBuffers? @Trinkle23897

I tried to use a ReplayBufferManager([buf1, buf2]) but encountered a strange error: After

tianshou/tianshou/data/buffer/manager.py

Line 26 in 7f23748

self.buffers = np.array(buffer_list, dtype=object)

, the script ended with an error message saying that numpy.ndarray object doesn't have attribute options. I printed out the types before and after that line and indeed, an array of ReplayBuffer was transformed into an array of numpy.ndarray. Also the time for the transformation was way too long.

Trinkle23897 · 2022-04-29T12:29:58Z

Not sure what happens, could you please send me the code?

nuance1979 · 2022-04-29T15:33:08Z

Not sure what happens, could you please send me the code?

Sure. See attachment.

I added a break here to generate two small buffers with 1000 transitions (otherwise it's too slow):

tianshou/examples/offline/convert_rl_unplugged_atari.py

Line 206 in 41afc25

print(f"...{cnt}", end="", flush=True)

Command line:

python3 ./atari_bcq.py --task BreakoutNoFrameskip-v4 --load-buffer-name ~/.rl_unplugged/buffers/Breakout/run_1-00001-of-00100.hdf5 --buffer-from-rl-unplugged --more-buffer-names ~/.rl_unplugged/buffers/Breakout/run_1-00002-of-00100.hdf5 --epoch 2 --device 'cuda:1' &> log.bcq.breakout.epoch_2.rl_unplugged.shard_1+2&

The error message:

Observations shape: (4, 84, 84)
Actions shape: 4
Traceback (most recent call last):
  File "./atari_bcq.py", line 211, in <module>
    test_discrete_bcq(get_args())
  File "./atari_bcq.py", line 143, in test_discrete_bcq
    buffer = ReplayBufferManager(bufs)
  File "/home/yi.su/git/tianshou/tianshou/data/buffer/manager.py", line 29, in __init__
    kwargs = self.buffers[0].options
AttributeError: 'numpy.ndarray' object has no attribute 'options'

atari_bcq.py.zip

Trinkle23897 · 2022-04-29T19:12:51Z

Is it possible to use an empty dataset to reproduce this result? (unrelated to rl-unplugged, because I need quite a long time to download one file...)

nuance1979 · 2022-04-29T20:05:32Z

Is it possible to use an empty dataset to reproduce this result? (unrelated to rl-unplugged, because I need quite a long time to download one file...)

I made minimum datasets with 5 transitions and 5 max size and reproduced the error. See attachment.

Breakout.zip

Trinkle23897 · 2022-04-30T14:37:38Z

I think the reason behind is that, when we developed the RBM, we assume the buffers in input buffer list are all uninitialized:

[ReplayBuffer(), ReplayBuffer(), ReplayBuffer(), ReplayBuffer(), ReplayBuffer(), ReplayBuffer(), ReplayBuffer(), ReplayBuffer(), ReplayBuffer(), ReplayBuffer()]

But if you pass an initialized buffer, it will be split by numpy.array automatically (and that's the root cause of slow speed):

[[Batch(...)
  Batch(...)
  Batch(...)
  Batch(...)
  Batch(...)]
 [Batch(...)
  Batch(...)
  Batch(...)
  Batch(...)
  Batch(...)]]

The normal (and maybe efficient) way:

vecbuf = VectorReplayBuffer()
# maybe we should manually trigger vecbuf._set_batch() first to allocate memory?
for i, name in enumerate(buffer_names):
  tmp_buf = ReplayBuffer.load(name)
  vecbuf.buffers[i].update(tmp_buf)

Trinkle23897 added the enhancement Feature that is not a new algorithm or an algorithm enhancement label Apr 25, 2022

nuance1979 mentioned this issue Apr 28, 2022

Convert RL Unplugged Atari datasets to tianshou ReplayBuffer #621

Merged

9 tasks

Trinkle23897 linked a pull request Apr 29, 2022 that will close this issue

Convert RL Unplugged Atari datasets to tianshou ReplayBuffer #621

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve discrete control offline RL benchmark #612

Improve discrete control offline RL benchmark #612

nuance1979 commented Apr 25, 2022

Trinkle23897 commented Apr 25, 2022 •

edited

Loading

nuance1979 commented Apr 27, 2022

Trinkle23897 commented Apr 27, 2022 •

edited

Loading

nuance1979 commented Apr 27, 2022

nuance1979 commented Apr 28, 2022

Trinkle23897 commented Apr 29, 2022

nuance1979 commented Apr 29, 2022 •

edited

Loading

Trinkle23897 commented Apr 29, 2022 •

edited

Loading

nuance1979 commented Apr 29, 2022

Trinkle23897 commented Apr 30, 2022 •

edited

Loading

Improve discrete control offline RL benchmark #612

Improve discrete control offline RL benchmark #612

Comments

nuance1979 commented Apr 25, 2022

Trinkle23897 commented Apr 25, 2022 • edited Loading

nuance1979 commented Apr 27, 2022

Trinkle23897 commented Apr 27, 2022 • edited Loading

nuance1979 commented Apr 27, 2022

nuance1979 commented Apr 28, 2022

Trinkle23897 commented Apr 29, 2022

nuance1979 commented Apr 29, 2022 • edited Loading

Trinkle23897 commented Apr 29, 2022 • edited Loading

nuance1979 commented Apr 29, 2022

Trinkle23897 commented Apr 30, 2022 • edited Loading

Trinkle23897 commented Apr 25, 2022 •

edited

Loading

Trinkle23897 commented Apr 27, 2022 •

edited

Loading

nuance1979 commented Apr 29, 2022 •

edited

Loading

Trinkle23897 commented Apr 29, 2022 •

edited

Loading

Trinkle23897 commented Apr 30, 2022 •

edited

Loading