-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time and Space Limits for Deep Learning on UoS HPCs #18
Comments
I suggest the /fastdata (Lustre filesystem) areas as they don't have quotas and they're the most performant areas when working with larger files. Files in those areas will be deleted after 60 days - is that too little time? Can the training process easily be paused/resumed? If so, that gives you more flexibility as to how the workload can be run: it could potentially be split across multiple jobs, possibly using free CPU/GPU cycles (see https://docs.hpc.shef.ac.uk/en/latest/hpc/scheduler/index.html#preemptable-jobs) If you want to discuss your GPU resource needs in more detail then do get in touch via [email protected]. |
Hello! I'm looking to retrain / train a larger model inspired by https://github.com/nadavbra/protein_bert and was wondering:
The GitHub for ProteinBERT seems to have some pretty nice instructions for the retraining, I'm just wondering if I'll run into any issues using that much disk space and that much GPU time!
Thanks a ton,
Brooks
The text was updated successfully, but these errors were encountered: