-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
question about RotaryPEMultiHeadAttention: rotary_percentage #246
Comments
YOONSEOKHEO
changed the title
question about RoPE code(rotary_percentage)
question about RotaryPEMultiHeadAttention: rotary_percentage
Mar 13, 2024
I'm also not sure. I usually set it to 1. I have seen implementations where it's set to 0.5. I guess they do it so that some dimensions never get rotated and it makes it easier for the model to use attention only using content with no interference from the positional information. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I confirmed that there is code in the RotaryPEMultiHeadAttention class that reduces the dimension using a parameter called rope_percentage.
(URL:
annotated_deep_learning_paper_implementations/labml_nn/transformers/rope/__init__.py
Line 205 in 285cb37
I am curious in what cases you would set rope_percentage to a value less than 1.
(Of course, in experiment.py, we confirmed that rope_percentage is set to 1.0.)
The text was updated successfully, but these errors were encountered: