-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about accumulated gradients metric #21
Comments
By the way, could you please explain the 'compression_weight' in the code? Why is 'compression_weight' for attention set to 36? Does this number have any special significance? |
Hi, @Hambaobao
UPop prunes parts with large cumulative gradients of corresponding learnable masks , which makes masks
For using gradients as a metric of importance, you may refer to this paper, but their motivations and specific uses of gradients are quite different.
, while each position in learnable mask |
Thank you very much for your detailed reply, it has truly been a great help to me. |
Dear author,
Hello, I have read your paper and code. UPop uses the cumulative gradient of mask as metric to evaluate the weight importance. However, I don't understand why UPop prunes the parts with large cumulative gradients. Does it mean that the parts with larger cumulative gradients are less important? Is there any related research supporting this, or is it based on intuition? Could you please provide some clarification?
Thank you.
The text was updated successfully, but these errors were encountered: