Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on the relationship between num_unroll_steps and infer_context_length in UniZero #248

Closed
Tiikara opened this issue Jul 24, 2024 · 9 comments
Labels
algorithm New algorithm config New or improved configuration polish Polish algorithms, tests or configs

Comments

@Tiikara
Copy link

Tiikara commented Jul 24, 2024

I've been studying the UniZero implementation and I have a question about two key parameters:

num_unroll_steps = 10
infer_context_length = 4

I noticed that these values are different, and I'm curious about the reasoning behind this design choice. Specifically:

  1. Why is num_unroll_steps set higher than infer_context_length?
  2. Does this configuration affect the training or inference efficiency?
  3. Would there be any drawbacks to setting these values equal to each other?

I'd greatly appreciate any insights you could provide on the rationale behind these parameter choices and their impact on the model's performance and behavior.

Thank you for your time and for creating this interesting algorithm.

@puyuan1996
Copy link
Collaborator

puyuan1996 commented Jul 24, 2024

  • Hello, thank you very much for your attention.
  • num_unroll_steps refers to the context_length during training (i.e., the length of the training sequence). This naming is to keep consistent with the MuZero algorithm. Indeed, we can consider changing it to train_context_length in the future. infer_context_length represents the context_length during collect/eval (i.e., only the most recent length of KVCache is retained for inference).
  • The reason we set num_unroll_steps larger than infer_context_length is that in our current implementation, the positional encoding uses the classic nn.Embedding, rather than the latest advanced positional encodings, such as RoPE, so the extrapolation ability is limited. Meanwhile, we hope the model sees a longer context during training (referencing this paper, predicting further into the future may help form some global structure) and select infer_context_length based on the task characteristics during inference. In our implementation, we set infer_context_length = 4 because generally, stacking 4 frames in Atari tasks is sufficient for optimal decision-making. Both longer num_unroll_steps and infer_context_length will require more CUDA memory and potentially perform better, so we need to decide their values based on the compute cost and task characteristics, and we can consider some adaptive method in the future.
  • Thank you again for your attention, and feel free to raise any questions or suggestions at any time.

@puyuan1996 puyuan1996 added the algorithm New algorithm label Jul 24, 2024
@Tiikara
Copy link
Author

Tiikara commented Jul 24, 2024

Thank you very much for your response. I would like to know more - I tried running unizero on my own with default parameters(Atari Pong with stack4), and the envstep graphs from tensorboard are not the same as in the paper. From the paper, it's clear that when envstep equals 0.1M, the reward starts to increase. I would like to know what envstep means in the paper?

image

@puyuan1996
Copy link
Collaborator

The EnvSteps in the paper refer to the total number of interaction steps with the environment. The curve data in our paper comes from pre-reconstruction experiments, while the code in the main branch has been refactored and optimized. It is reasonable for there to be slight differences in the curves, but there should not be significant discrepancies. Please ensure you are using the default configuration (atari_unizero_config.py or atari_unizero_stack4_config.py). Perhaps you could try running atari_unizero_config.py once to see the effect and provide complete training tensorboard logs for us to analyze the cause. Thank you.

@puyuan1996 puyuan1996 added config New or improved configuration polish Polish algorithms, tests or configs labels Jul 24, 2024
@Tiikara
Copy link
Author

Tiikara commented Jul 24, 2024

I have conducted a thorough comparison between the parameters specified in the paper and those in the configuration files. Unfortunately, I did not identify any significant discrepancies. I executed atari_unizero_stack4_config.py with only minor modifications: I adjusted the frequency of checkpoint saves and enabled video file recording. These changes should not materially affect the training process or results.

formatted_total_config.zip

@puyuan1996
Copy link
Collaborator

Thank you for your feedback. We will verify and address the performance issues related to atari_unizero_stack4_config.py within a week. The code using stack1 in the main branch, atari_unizero_config.py, has already been confirmed to perform consistently with the curves presented in the paper. We recommend using atari_unizero_config.py for your tests and research in the meantime. Thank you for your patience.

@puyuan1996 puyuan1996 changed the title Clarification on the relationship between num_unroll_steps and infer_context_length Clarification on the relationship between num_unroll_steps and infer_context_length in UniZero Jul 25, 2024
@Tiikara
Copy link
Author

Tiikara commented Jul 25, 2024

Thank you for your prompt and helpful reply. I truly appreciate your dedication to addressing this issue. The work you're doing with LightZero is impressive and valuable to the research community.

@Tiikara Tiikara closed this as completed Jul 25, 2024
@Tiikara
Copy link
Author

Tiikara commented Jul 25, 2024

For clarification: In the Tensorboard, do the collector_step and tabs with the *_step prefix represent envstep? Or do they represent *_iter? Which data sources (collector, evaluator, etc.) were used for the statistics presented in the paper? I want to ensure I'm interpreting the data correctly. Thank you for your assistance.

@Tiikara Tiikara reopened this Jul 25, 2024
@puyuan1996
Copy link
Collaborator

Hello, in Tensorboard, tags with the _step prefix indicate the environment steps (EnvStep), while tags with the _iter prefix indicate the training iterations (train iteration). For detailed information, you can refer to this documentation. Unless otherwise stated, the learning curves in the paper generally refer to the curves in the evaluator. Best wishes.

@Tiikara
Copy link
Author

Tiikara commented Jul 26, 2024

Thank you very much for your clear and informative response. I believe this addresses all my current questions, so I'll be closing this thread. Thank you again for your time and support throughout this discussion.

@Tiikara Tiikara closed this as completed Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
algorithm New algorithm config New or improved configuration polish Polish algorithms, tests or configs
Projects
None yet
Development

No branches or pull requests

2 participants