-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve discrete control offline RL benchmark #612
Comments
They always treat |
I managed to convert a shard of Pong dataset ( I wanted to do this conversion because I hate to keep Tensorflow installed. However, without compression, the file size becomes an issue. Also the conversion is quite slow. It would be great if we could find some cloud storage space for the converted files. What do you think? @Trinkle23897 |
Maybe we can add another way of ReplayBuffer save/restore. I remember the compression algorithm from numpy itself is much more efficient than pickle/hdf5 (according to my experiments at that time). |
I tried hdf5 compression first and it worked pretty well: 53GB -> 283MB. |
I have a script to convert a shard of RL Unplugged dataset into a I tried to use a ReplayBufferManager([buf1, buf2]) but encountered a strange error: After tianshou/tianshou/data/buffer/manager.py Line 26 in 7f23748
numpy.ndarray object doesn't have attribute options . I printed out the types before and after that line and indeed, an array of ReplayBuffer was transformed into an array of numpy.ndarray . Also the time for the transformation was way too long.
|
Not sure what happens, could you please send me the code? |
Sure. See attachment. I added a
Command line:
The error message:
|
Is it possible to use an empty dataset to reproduce this result? (unrelated to rl-unplugged, because I need quite a long time to download one file...) |
I made minimum datasets with 5 transitions and 5 max size and reproduced the error. See attachment. |
I think the reason behind is that, when we developed the RBM, we assume the buffers in input buffer list are all uninitialized: [ReplayBuffer(), ReplayBuffer(), ReplayBuffer(), ReplayBuffer(), ReplayBuffer(), ReplayBuffer(), ReplayBuffer(), ReplayBuffer(), ReplayBuffer(), ReplayBuffer()] But if you pass an initialized buffer, it will be split by numpy.array automatically (and that's the root cause of slow speed): [[Batch(...)
Batch(...)
Batch(...)
Batch(...)
Batch(...)]
[Batch(...)
Batch(...)
Batch(...)
Batch(...)
Batch(...)]] The normal (and maybe efficient) way: vecbuf = VectorReplayBuffer()
# maybe we should manually trigger vecbuf._set_batch() first to allocate memory?
for i, name in enumerate(buffer_names):
tmp_buf = ReplayBuffer.load(name)
vecbuf.buffers[i].update(tmp_buf) |
I wonder if anyone is actively working on improving the benchmark on discrete control offline RL policies. As I noted in
examples/offline/README.md
, we should use a publicly available dataset to benchmark our policies.Currently the best discrete control offline datasets seem to be the Atari portion of RL Unplugged. I tried to convert them into Tianshou
Batch
but couldn't find out how to get thedone
flag.https://github.com/deepmind/deepmind-research/blob/1642ae3499c8d1135ec6fe620a68911091dd25ef/rl_unplugged/atari.py#L26-L37
If I assume the data points are in order, then I might be able to find the point where the next episode id is not the same as the current episode id and mark it to be episode end. But I don't know if this assumption holds.
@Trinkle23897 What do you think? Have you worked with Reverb data before?
The text was updated successfully, but these errors were encountered: