Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SparseAdam does not support with share_decoder_embeddings option #1015

Closed
wonkeelee opened this issue Oct 28, 2018 · 3 comments
Closed

SparseAdam does not support with share_decoder_embeddings option #1015

wonkeelee opened this issue Oct 28, 2018 · 3 comments

Comments

@wonkeelee
Copy link

wonkeelee commented Oct 28, 2018

Hi,

As far as I know is that to train transformer MT, set sparseadam to optimizer instead of adam through # 637
Therefore, When I tried to train transformer MT with -share_decoder_embeddings option, error occurred with message "SparseAdam does not support dense gradients, please consider Adam instead"

Is there any problem with changing the optimizer to adam according to the error message?

Thanks!

@vince62s
Copy link
Member

Did you use multi gpu or single gpu ?

in any event multi gpu does not support sparse adam so use adam instead.

but we may need to fix sparse adam single gpu with share_decoder_emneddings
we may have missed somehting.

@wonkeelee
Copy link
Author

@vince62s Thanks for your quick response. I worked with single GPU thanks!

@vince62s
Copy link
Member

Can you please post your command line and error so that I can have a look ? (I think we may drop sparse adam in the future unless pytorch supports sparse for distributed functions)

@vince62s vince62s changed the title [error] Transformer with share_decoder_embeddings option SparseAdam does not support with share_decoder_embeddings option Nov 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants