-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SparseAdam does not support with share_decoder_embeddings option #1015
Comments
Did you use multi gpu or single gpu ? in any event multi gpu does not support sparse adam so use adam instead. but we may need to fix sparse adam single gpu with share_decoder_emneddings |
@vince62s Thanks for your quick response. I worked with single GPU thanks! |
Can you please post your command line and error so that I can have a look ? (I think we may drop sparse adam in the future unless pytorch supports sparse for distributed functions) |
Hi,
As far as I know is that to train transformer MT, set sparseadam to optimizer instead of adam through # 637
Therefore, When I tried to train transformer MT with
-share_decoder_embeddings
option, error occurred with message "SparseAdam does not support dense gradients, please consider Adam instead"Is there any problem with changing the optimizer to adam according to the error message?
Thanks!
The text was updated successfully, but these errors were encountered: