Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: module 'torch.utils' has no attribute 'checkpoint' in gpt-neox/gpt-neox #80

Closed
kinoc opened this issue Jan 23, 2021 · 4 comments · Fixed by #85
Closed
Labels
bug Something isn't working

Comments

@kinoc
Copy link

kinoc commented Jan 23, 2021

while running " deepspeed train_enwik8.py --deepspeed --deepspeed_config ./configs/deepspeed_zero2.json" I got the error AttributeError: module 'torch.utils' has no attribute 'checkpoint' in gpt-neox/gpt-neox
seems similar to problem described in pyTorch attributeerror-module-torch-utils-has-no-attribute-checkpoint

quick-fix: adding
"from torch.utils.checkpoint import checkpoint"
to top of file seems to fix the problem

@StellaAthena StellaAthena added the bug Something isn't working label Jan 23, 2021
@StellaAthena
Copy link
Member

Are you sure you have the versions specified in requirements.txt installed? I just ran this without any issue.

@joshlk
Copy link
Member

joshlk commented Jan 23, 2021

I get the same error with torch v1.6.0 or v1.7.0. What version are you using?

@StellaAthena
Copy link
Member

~/gpt-neox# python3
Python 3.8.5 (default, Sep  4 2020, 07:30:14) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> c = torch.utils.checkpoint.checkpoint
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'torch.utils' has no attribute 'checkpoint'
>>> from torch.utils.checkpoint import checkpoint
>>> checkpoint
<function checkpoint at 0x7fcb633aca60>
>>> 

Well then. No idea how this is even possible, but that's the error it seems. We're about to roll something out, I'll make sure this fix is included.

@StellaAthena StellaAthena linked a pull request Jan 23, 2021 that will close this issue
@kinoc
Copy link
Author

kinoc commented Jan 23, 2021

Are you sure you have the versions specified in requirements.txt installed? I just ran this without any issue.

einops 0.3.0
torch 1.7.1
torchcontrib 0.0.2
tqdm 4.56.0
transformers 4.2.2
ftfy 5.8
lm-dataformat 0.0.19

The one stand out is :
tensorflow-gpu 1.14.0

It runs fine with the line locally, and not when I remove it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants