Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torch verision conflict with the arch with A100/3090? #5

Open
cheliu-computation opened this issue Nov 2, 2023 · 1 comment
Open

Comments

@cheliu-computation
Copy link

Hi, thanks for your impressive work.

I just got some issue when I attemped to deploy this repo on my cluster (with 8 A100 GPU)
First I request more than 100Gb memory but still got the segmentation fault error, it is about the memory leaking, does it mean there is any memory sensitive operation in the code?

Also, when I check the package verision, I found the torch is set as 1.9, but my torch is 1.13 with CUDA 11.6.
Does this code only work with the torch 1.9, becasue the current arch GPU, such as A100/A6000/3090 only support torch>1.11?

Looking forward to your resposne!
Best Regards

@chao1224
Copy link
Owner

chao1224 commented Nov 2, 2023

Hi @cheliu-computation,

Can you provide more details on what CMD you are running now?
BTW. Now our Python scripts are for single-gpu running, and no DDP has been added yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants