Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve discrete control offline RL benchmark #612

Open
4 of 8 tasks
nuance1979 opened this issue Apr 25, 2022 · 10 comments · Fixed by #621
Open
4 of 8 tasks

Improve discrete control offline RL benchmark #612

nuance1979 opened this issue Apr 25, 2022 · 10 comments · Fixed by #621
Labels
enhancement Feature that is not a new algorithm or an algorithm enhancement

Comments

@nuance1979
Copy link
Collaborator

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the source website
  • I have searched through the issue tracker for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:
    import tianshou, gym, torch, numpy, sys
    print(tianshou.__version__, gym.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)

I wonder if anyone is actively working on improving the benchmark on discrete control offline RL policies. As I noted in examples/offline/README.md, we should use a publicly available dataset to benchmark our policies.

Currently the best discrete control offline datasets seem to be the Atari portion of RL Unplugged. I tried to convert them into Tianshou Batch but couldn't find out how to get the done flag.

https://github.com/deepmind/deepmind-research/blob/1642ae3499c8d1135ec6fe620a68911091dd25ef/rl_unplugged/atari.py#L26-L37

If I assume the data points are in order, then I might be able to find the point where the next episode id is not the same as the current episode id and mark it to be episode end. But I don't know if this assumption holds.

@Trinkle23897 What do you think? Have you worked with Reverb data before?

@Trinkle23897
Copy link
Collaborator

Trinkle23897 commented Apr 25, 2022

but couldn't find out how to get the done flag.

They always treat discount as another term of done. Ref: https://github.com/sail-sg/envpool/blob/5b08389ec0fad903a9fb3288d54f470bc790bdfc/envpool/python/dm_envpool.py#L63

https://github.com/deepmind/deepmind-research/blob/1642ae3499c8d1135ec6fe620a68911091dd25ef/rl_unplugged/atari.py#L227

@Trinkle23897 Trinkle23897 added the enhancement Feature that is not a new algorithm or an algorithm enhancement label Apr 25, 2022
@nuance1979
Copy link
Collaborator Author

I managed to convert a shard of Pong dataset (Pong/run_1-00000-of-00100) into tianshou.data.ReplayBuffer and saved it to disk in hdf5. However, the size of the hdf5 file is 53GB! As a reference, the size of the original file is 720MB. As I understand it, the original file is a gzipped TFRecord protobuffer. The number of samples is 498549. For RL Unplugged Atari data, observations are already framestacked at 4, i.e., the obs space shape is (84, 84, 4). Note that I'm talking about one file here. There are 5*100=500 similarly-sized files for each Atari game.

I wanted to do this conversion because I hate to keep Tensorflow installed. However, without compression, the file size becomes an issue. Also the conversion is quite slow. It would be great if we could find some cloud storage space for the converted files.

What do you think? @Trinkle23897

@Trinkle23897
Copy link
Collaborator

Trinkle23897 commented Apr 27, 2022

Maybe we can add another way of ReplayBuffer save/restore. I remember the compression algorithm from numpy itself is much more efficient than pickle/hdf5 (according to my experiments at that time).

@nuance1979
Copy link
Collaborator Author

Maybe we can add another way of ReplayBuffer save/restore. I remember the compression algorithm from numpy itself is much more efficient than pickle/hdf5 (according to my experiments at that time).

I tried hdf5 compression first and it worked pretty well: 53GB -> 283MB.

@nuance1979
Copy link
Collaborator Author

I have a script to convert a shard of RL Unplugged dataset into a tianshou.data.ReplayBuffer. Each shard contains about 500k transitions. Now I want to run an experiment with 1M transitions. What is the best way to "merge" two ReplayBuffers? @Trinkle23897

I tried to use a ReplayBufferManager([buf1, buf2]) but encountered a strange error: After

self.buffers = np.array(buffer_list, dtype=object)
, the script ended with an error message saying that numpy.ndarray object doesn't have attribute options. I printed out the types before and after that line and indeed, an array of ReplayBuffer was transformed into an array of numpy.ndarray. Also the time for the transformation was way too long.

@Trinkle23897 Trinkle23897 linked a pull request Apr 29, 2022 that will close this issue
9 tasks
@Trinkle23897
Copy link
Collaborator

Not sure what happens, could you please send me the code?

@nuance1979
Copy link
Collaborator Author

nuance1979 commented Apr 29, 2022

Not sure what happens, could you please send me the code?

Sure. See attachment.

I added a break here to generate two small buffers with 1000 transitions (otherwise it's too slow):

print(f"...{cnt}", end="", flush=True)

Command line:

python3 ./atari_bcq.py --task BreakoutNoFrameskip-v4 --load-buffer-name ~/.rl_unplugged/buffers/Breakout/run_1-00001-of-00100.hdf5 --buffer-from-rl-unplugged --more-buffer-names ~/.rl_unplugged/buffers/Breakout/run_1-00002-of-00100.hdf5 --epoch 2 --device 'cuda:1' &> log.bcq.breakout.epoch_2.rl_unplugged.shard_1+2&

The error message:

Observations shape: (4, 84, 84)
Actions shape: 4
Traceback (most recent call last):
  File "./atari_bcq.py", line 211, in <module>
    test_discrete_bcq(get_args())
  File "./atari_bcq.py", line 143, in test_discrete_bcq
    buffer = ReplayBufferManager(bufs)
  File "/home/yi.su/git/tianshou/tianshou/data/buffer/manager.py", line 29, in __init__
    kwargs = self.buffers[0].options
AttributeError: 'numpy.ndarray' object has no attribute 'options'

atari_bcq.py.zip

@Trinkle23897
Copy link
Collaborator

Trinkle23897 commented Apr 29, 2022

Is it possible to use an empty dataset to reproduce this result? (unrelated to rl-unplugged, because I need quite a long time to download one file...)

@nuance1979
Copy link
Collaborator Author

Is it possible to use an empty dataset to reproduce this result? (unrelated to rl-unplugged, because I need quite a long time to download one file...)

I made minimum datasets with 5 transitions and 5 max size and reproduced the error. See attachment.

Breakout.zip

@Trinkle23897
Copy link
Collaborator

Trinkle23897 commented Apr 30, 2022

I think the reason behind is that, when we developed the RBM, we assume the buffers in input buffer list are all uninitialized:

[ReplayBuffer(), ReplayBuffer(), ReplayBuffer(), ReplayBuffer(), ReplayBuffer(), ReplayBuffer(), ReplayBuffer(), ReplayBuffer(), ReplayBuffer(), ReplayBuffer()]

But if you pass an initialized buffer, it will be split by numpy.array automatically (and that's the root cause of slow speed):

[[Batch(...)
  Batch(...)
  Batch(...)
  Batch(...)
  Batch(...)]
 [Batch(...)
  Batch(...)
  Batch(...)
  Batch(...)
  Batch(...)]]

The normal (and maybe efficient) way:

vecbuf = VectorReplayBuffer()
# maybe we should manually trigger vecbuf._set_batch() first to allocate memory?
for i, name in enumerate(buffer_names):
  tmp_buf = ReplayBuffer.load(name)
  vecbuf.buffers[i].update(tmp_buf)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature that is not a new algorithm or an algorithm enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants