Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how does the privileged observation work? #33

Open
rendashuai17 opened this issue Jan 9, 2023 · 1 comment
Open

how does the privileged observation work? #33

rendashuai17 opened this issue Jan 9, 2023 · 1 comment
Assignees
Labels
help wanted Extra attention is needed

Comments

@rendashuai17
Copy link

thanks for your great contribution!

I notice that you use the privileged observation as critic obs for assymetric training in the PPO, but you haven`t mention this in the paper,
Could you please explain this part more clearly?

Plus, I notice that in other works by your team the privileged observation is used for distillation that can be reconstructed in the student policy, is the two privileged observation the same? If so, how does it work?

@rendashuai17 rendashuai17 added the bug Something isn't working label Jan 9, 2023
@nikitardn
Copy link
Collaborator

Hi,

The privileged observations feature is implemented but were not using it in the paper. These privileged observations are not used in a teacher-student distillation.
Instead, they are used in asymmetric actor-critic training, where the critic receives more information than the actor. This allows giving the critic information which won't be available on a real robot.
The teacher-student distillation is 2nd step that has to be done after the RL training. It is not implemented in this code.

Hopefully, this clarifies the distinction.

@nikitardn nikitardn added help wanted Extra attention is needed and removed bug Something isn't working labels Jan 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants