Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recording Expert Data from myself in Discrete Action Space #319

Open
JankyOo opened this issue May 10, 2019 · 2 comments
Open

Recording Expert Data from myself in Discrete Action Space #319

JankyOo opened this issue May 10, 2019 · 2 comments
Labels
question Further information is requested

Comments

@JankyOo
Copy link

JankyOo commented May 10, 2019

Good Morning,

i try to pretrain my A2C agent with expert data. Therefore i would like to record myself playing Super Mario (SuperMarioAdvance4 for GameboyAdvance). As documented the pretrain()-method does only work with a descrete action space.

I can record myself and write all the needed data in a .npz file. Similar to the file created by generate_expert_traj().

Now the problem: Unfortunatly i can only record with a MultiBinary action space, not with the required descrete one. I read through the documentation and tried to figure out, how the descrete action space is coded. I could not find a solution.

Is there any way to translate a MultiBinary action space into a descrete one? Or is there something like a map, which explains the actions Mario takes mapped onto the descrete numbers?

@JankyOo JankyOo changed the title Recording Expert Data from myself Recording Expert Data from myself in Descrete Action Space May 10, 2019
@araffin araffin added the question Further information is requested label May 10, 2019
@araffin
Copy link
Collaborator

araffin commented May 10, 2019

Is there any way to translate a MultiBinary action space into a descrete one?

well, multibinary actions are in {0, 1}^n, and there is a bijective mapping from {0,1}^n to [[0, m]] (discrete actions), where m=2^n -1, which is in fact the binary representation of the number.

An example with n=2:
[0, 0] in multibinary corresponds to 0 in discrete space
[0, 1] -> 1 = 0 x 2^1 + 1x 2^0
[1, 0] -> 2 = 1 x 2^1 + 0x2^0
[1, 1] -> 3 = 1 x 2^1 + 1 x2^0

That way, you can easily map multibinary actions to discrete ones, and use the pretrain() method.
However, it would be better to add support directly in the pretrain() method, but I have no time for that for now.

@JankyOo
Copy link
Author

JankyOo commented May 13, 2019

Good Morning araffin,

thanks for the fast response.

Unfortunatly i could not quite implement a solution based on your answer. I think i understood what you were saying, but my tests didn´t work out as expected.

env = retro.make("SuperMarioAdvance4-Gba", state="Level1_1", scenario="scenario", use_restricted_actions=retro.Actions.DISCRETE) env = DummyVecEnv([lambda: env]) print(env.action_space)
This code tells me "Descrete(288)". So i expect there are 288 different numbers which SuperMarioAdvance4-Gba can work with.

env = retro.make("SuperMarioAdvance4-Gba", state="Level1_1", scenario="scenario") env = DummyVecEnv([lambda: env]) print(env.action_space)
This code tells me "MultiBinary(12)". So there are 12 different values in the action array, one for each button (+2 None values).
By try and error i can map the 12 Bits in the Multibinary-Array to the different buttons.
For example: Bit 9 is Button B, therefore Mario Jumps. Bit 7 is Button left, therefore Mario walks left.

Afterwards i tried to take descrete actions which fit your explanation, or better to say my understanding of your explanation, and the outcome was not what i was expecting.

Test Code:
while not done action = DescreteVlaue obs, rew, done, info = env.step(action)

So descrete 64 (MultiBinary (Binary Value of descrete) = 000001000000) lets Mario walk left, so apperently the bit on spot 6 is left button. But descrete 4 (Binary = 001000000000) makes Mario go left again... so bit on spot 3 is left button as well?

By going through some numbers by try and error i could just make mario go left and right (each several different combinations) but i could not find jumping (Button B) or anything else.

So all in all i got confused, because i could not quite match one bit to one action because either there was no ingame reaction for a bit, or the same action for different bits (Descrete Values).

Sorry for the long issue, i hope its not unnecessarily bothering because of my wrong understanding.

@araffin araffin changed the title Recording Expert Data from myself in Descrete Action Space Recording Expert Data from myself in Discrete Action Space May 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants