Clarification on the relationship between num_unroll_steps and infer_context_length in UniZero #248

Tiikara · 2024-07-24T10:59:40Z

I've been studying the UniZero implementation and I have a question about two key parameters:

num_unroll_steps = 10
infer_context_length = 4

I noticed that these values are different, and I'm curious about the reasoning behind this design choice. Specifically:

Why is num_unroll_steps set higher than infer_context_length?
Does this configuration affect the training or inference efficiency?
Would there be any drawbacks to setting these values equal to each other?

I'd greatly appreciate any insights you could provide on the rationale behind these parameter choices and their impact on the model's performance and behavior.

Thank you for your time and for creating this interesting algorithm.

The text was updated successfully, but these errors were encountered:

puyuan1996 · 2024-07-24T11:36:23Z

Hello, thank you very much for your attention.
num_unroll_steps refers to the context_length during training (i.e., the length of the training sequence). This naming is to keep consistent with the MuZero algorithm. Indeed, we can consider changing it to train_context_length in the future. infer_context_length represents the context_length during collect/eval (i.e., only the most recent length of KVCache is retained for inference).
The reason we set num_unroll_steps larger than infer_context_length is that in our current implementation, the positional encoding uses the classic nn.Embedding, rather than the latest advanced positional encodings, such as RoPE, so the extrapolation ability is limited. Meanwhile, we hope the model sees a longer context during training (referencing this paper, predicting further into the future may help form some global structure) and select infer_context_length based on the task characteristics during inference. In our implementation, we set infer_context_length = 4 because generally, stacking 4 frames in Atari tasks is sufficient for optimal decision-making. Both longer num_unroll_steps and infer_context_length will require more CUDA memory and potentially perform better, so we need to decide their values based on the compute cost and task characteristics, and we can consider some adaptive method in the future.
Thank you again for your attention, and feel free to raise any questions or suggestions at any time.

Tiikara · 2024-07-24T11:55:33Z

Thank you very much for your response. I would like to know more - I tried running unizero on my own with default parameters(Atari Pong with stack4), and the envstep graphs from tensorboard are not the same as in the paper. From the paper, it's clear that when envstep equals 0.1M, the reward starts to increase. I would like to know what envstep means in the paper?

puyuan1996 · 2024-07-24T14:15:44Z

The EnvSteps in the paper refer to the total number of interaction steps with the environment. The curve data in our paper comes from pre-reconstruction experiments, while the code in the main branch has been refactored and optimized. It is reasonable for there to be slight differences in the curves, but there should not be significant discrepancies. Please ensure you are using the default configuration (atari_unizero_config.py or atari_unizero_stack4_config.py). Perhaps you could try running atari_unizero_config.py once to see the effect and provide complete training tensorboard logs for us to analyze the cause. Thank you.

Tiikara · 2024-07-24T20:22:10Z

I have conducted a thorough comparison between the parameters specified in the paper and those in the configuration files. Unfortunately, I did not identify any significant discrepancies. I executed atari_unizero_stack4_config.py with only minor modifications: I adjusted the frequency of checkpoint saves and enabled video file recording. These changes should not materially affect the training process or results.

formatted_total_config.zip

puyuan1996 · 2024-07-25T05:18:28Z

Thank you for your feedback. We will verify and address the performance issues related to atari_unizero_stack4_config.py within a week. The code using stack1 in the main branch, atari_unizero_config.py, has already been confirmed to perform consistently with the curves presented in the paper. We recommend using atari_unizero_config.py for your tests and research in the meantime. Thank you for your patience.

Tiikara · 2024-07-25T08:00:35Z

Thank you for your prompt and helpful reply. I truly appreciate your dedication to addressing this issue. The work you're doing with LightZero is impressive and valuable to the research community.

Tiikara · 2024-07-25T08:14:59Z

For clarification: In the Tensorboard, do the collector_step and tabs with the *_step prefix represent envstep? Or do they represent *_iter? Which data sources (collector, evaluator, etc.) were used for the statistics presented in the paper? I want to ensure I'm interpreting the data correctly. Thank you for your assistance.

puyuan1996 · 2024-07-25T15:00:25Z

Hello, in Tensorboard, tags with the _step prefix indicate the environment steps (EnvStep), while tags with the _iter prefix indicate the training iterations (train iteration). For detailed information, you can refer to this documentation. Unless otherwise stated, the learning curves in the paper generally refer to the curves in the evaluator. Best wishes.

Tiikara · 2024-07-26T05:15:24Z

Thank you very much for your clear and informative response. I believe this addresses all my current questions, so I'll be closing this thread. Thank you again for your time and support throughout this discussion.

puyuan1996 added the algorithm New algorithm label Jul 24, 2024

puyuan1996 added config New or improved configuration polish Polish algorithms, tests or configs labels Jul 24, 2024

puyuan1996 changed the title ~~Clarification on the relationship between num_unroll_steps and infer_context_length~~ Clarification on the relationship between num_unroll_steps and infer_context_length in UniZero Jul 25, 2024

Tiikara closed this as completed Jul 25, 2024

Tiikara reopened this Jul 25, 2024

Tiikara closed this as completed Jul 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification on the relationship between num_unroll_steps and infer_context_length in UniZero #248

Clarification on the relationship between num_unroll_steps and infer_context_length in UniZero #248

Tiikara commented Jul 24, 2024

puyuan1996 commented Jul 24, 2024 •

edited

Loading

Tiikara commented Jul 24, 2024 •

edited

Loading

puyuan1996 commented Jul 24, 2024

Tiikara commented Jul 24, 2024

puyuan1996 commented Jul 25, 2024

Tiikara commented Jul 25, 2024

Tiikara commented Jul 25, 2024 •

edited

Loading

puyuan1996 commented Jul 25, 2024

Tiikara commented Jul 26, 2024

Clarification on the relationship between num_unroll_steps and infer_context_length in UniZero #248

Clarification on the relationship between num_unroll_steps and infer_context_length in UniZero #248

Comments

Tiikara commented Jul 24, 2024

puyuan1996 commented Jul 24, 2024 • edited Loading

Tiikara commented Jul 24, 2024 • edited Loading

puyuan1996 commented Jul 24, 2024

Tiikara commented Jul 24, 2024

puyuan1996 commented Jul 25, 2024

Tiikara commented Jul 25, 2024

Tiikara commented Jul 25, 2024 • edited Loading

puyuan1996 commented Jul 25, 2024

Tiikara commented Jul 26, 2024

puyuan1996 commented Jul 24, 2024 •

edited

Loading

Tiikara commented Jul 24, 2024 •

edited

Loading

Tiikara commented Jul 25, 2024 •

edited

Loading